jinaai
/

jina-embeddings-v4

@@ -1,92 +1,293 @@
-# Jina Embeddings V4
-## Examples
-Encode functions:
-```python
-import torch
-from transformers import AutoModel
-from PIL import Image
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-# Load model
-model = AutoModel.from_pretrained('jinaai/jina-embeddings-v4', trust_remote_code=True)
-model = model.to(device)
-# Sample data
-texts = ["Here is some sample code", "This is a matching text"]
-image_paths = ['/<path_to_image>']
-images = [Image.open(path) for path in image_paths]
-# Example 1: Text matching task with single vector embeddings
-# Generate embeddings with dimension truncation (256), decrease max_pixels
-img_embeddings = model.encode_images(images=images, truncate_dim=256, max_pixels=602112, task='text-matching')
-text_embeddings = model.encode_texts(texts=texts, truncate_dim=256, max_length=512, task='text-matching')
-# Example 2: Retrieval task with multi-vector embeddings
-model.set_task(task='retrieval')
-# Generate multi-vector embeddings
-img_embeddings = model.encode_images(images=images, vector_type='multi_vector')
-text_embeddings = model.encode_texts(texts=texts, vector_type='multi_vector', prompt_name='passage')
-# Example 3: Code task with single vector embeddings
-code = ["def hello_world():\n    print('Hello, World!')"]
-code_embeddings = model.encode_texts(texts=code, task='code')
-```
-Using the model forward:
-```python
-import torch
-from transformers import AutoModel, AutoProcessor
-from PIL import Image
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-# Load model and processor
-model = AutoModel.from_pretrained('jinaai/jina-embeddings-v4', trust_remote_code=True)
-model = model.to(device)
-processor = AutoProcessor.from_pretrained('jinaai/jina-embeddings-v4', trust_remote_code=True)
-# Sample data
-texts = ["Here is some sample code", "This is a matching text"]
-image_paths = ['/<path_to_image>']
-# Process text and images
-text_batch = processor.process_texts(texts=texts, prefix="Query", max_length=512)
-images = [Image.open(path) for path in image_paths]
-image_batch = processor.process_images(images=images)
-# Forward pass
-model.eval()
-with torch.no_grad():
-    text_batch = {k: v.to(device) for k, v in text_batch.items()}
-    image_batch = {k: v.to(device) for k, v in image_batch.items()}
-    with torch.autocast(device_type='cuda' if torch.cuda.is_available() else 'cpu'):
-        # Get embeddings
-        text_embeddings = model.model(**text_batch, task_label='retrieval').single_vec_emb
-        img_embeddings = model.model(**image_batch, task_label='retrieval').single_vec_emb
 ```
-Inference via the `SentenceTransformer` library:
 ```python
 from sentence_transformers import SentenceTransformer
-model = SentenceTransformer(
-    'jinaai/jina-embeddings-v4', trust_remote_code=True
 )
-emb = model.encode(['Khinkali is the best'], task='retrieval', prompt_name='query')
-```

+<br><br>
+<p align="center">
+<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
+</p>
+<p align="center">
+<b>The embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
+</p>
+<p align="center">
+<b>Jina Embeddings v4: Multilingual Multimodal Embeddings</b>
+</p>
+This model is based on the paper [jina-embeddings-v4: Multilingual Multimodal Embeddings](https://puginarug.com/).
+## Quick Start
+[Blog](https://alwaysjudgeabookbyitscover.com/) | [Technical Report](https://puginarug.com) | [API](https://jina.ai/embeddings)
+## Intended Usage & Model Info
+`jina-embeddings-v4` is a multilingual, multimodal embedding model designed for unified representation of text and images.
+The model is specialized for complex document retrieval, including visually rich documents with charts, tables, and illustrations.
+Embeddings produced by `jina-embeddings-v4` serve as the backbone for neural information retrieval and multimodal GenAI applications.
+Built based on [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), `jina-embeddings-v4` has the following features:
+- **Unified embeddings** for text, images, and documents, supporting both dense (single-vector) and late-interaction (multi-vector) retrieval.
+- **Multilingual support** (20+ languages) and compatibility with a wide range of domains, including technical and visually complex documents.
+- **Task-specific adapters** for retrieval, text matching, and code-related tasks, which can be selected at inference time.
+- **Flexible embedding size**: dense embeddings are 2048 dimensions by default but can be truncated to as low as 128 with minimal performance loss.
+Summary of features:
+| Feature   | Jina Embeddings V4   |
+|------------|------------|
+| Base Model | Qwen2.5-VL-3B-Instruct |
+| Supported Tasks | Retrieval, Text Matching, Code |
+| Model DType | BFloat 16 |
+| Max Sequence Length | 8192 |
+| Single-Vector Dimension | 2048 |
+| Multi-Vector Dimension | 128 |
+| Matryoshka dimensions | 128, 256, 512, 1024, 2048 |
+| Attention Mechanism | FlashAttention2 |
+| Pooling Strategy | Mean pooling |
+## Training, Data, Parameters
+Please refer to our [technical report of jina-embeddings-v4](https://puginarug.com) for the model and training details.
+## Usage
+<details>
+  <summary>Requirements</a></summary>
+The following Python packages are required:
+- `transformers>=4.52.0`
+- `torch>=2.6.0`
+- `peft>=0.15.2`
+- `torchvision`
+- `pillow`
+### Optional / Recommended
+- **flash-attention**: Installing [flash-attention](https://github.com/Dao-AILab/flash-attention) is recommended for improved inference speed and efficiency, but not mandatory.
+- **sentence-transformers**: If you want to use the model via the `sentence-transformers` interface, install this package as well.
+</details>
+<details>
+  <summary>via Jina AI <a href="https://jina.ai/embeddings/">Embedding API</a></summary>
+Needs to be adjusted for V4
+```bash
+curl https://api.jina.ai/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer [JINA_AI_API_TOKEN]" \
+  -d @- <<EOFEOF
+  {
+    "model": "jina-embeddings-v4",
+    "dimensions": 1024,
+    "task": "retrieval.query",
+    "normalized": true,
+    "embedding_type": "float",
+    "input": [
+        {
+            "text": "غروب جميل على الشاطئ"
+        },
+        {
+            "text": "海滩上美丽的日落"
+        },
+        {
+            "text": "A beautiful sunset over the beach"
+        },
+        {
+            "text": "Un beau coucher de soleil sur la plage"
+        },
+        {
+            "text": "Ein wunderschöner Sonnenuntergang am Strand"
+        },
+        {
+            "text": "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία"
+        },
+        {
+            "text": "समुद्र तट पर एक खूबसूरत सूर्यास्त"
+        },
+        {
+            "text": "Un bellissimo tramonto sulla spiaggia"
+        },
+        {
+            "text": "浜辺に沈む美しい夕日"
+        },
+        {
+            "text": "해변 위로 아름다운 일몰"
+        },
+        {
+            "image": "https://i.ibb.co/nQNGqL0/beach1.jpg"
+        },
+        {
+            "image": "https://i.ibb.co/r5w8hG8/beach2.jpg"
+        }
+    ]
+  }
+EOFEOF
 ```
+</details>
+<details>
+  <summary>via <a href="https://huggingface.co/docs/transformers/en/index">transformers</a></summary>
+```python
+# !pip install transformers>=4.52.0 torch>=2.6.0 peft>=0.15.2 torchvision pillow
+# !pip install
+from transformers import AutoModel
+# Initialize the model
+model = AutoModel.from_pretrained("jinaai/jina-embeddings-v4", trust_remote_code=True)
+# ========================
+# 1. Retrieval Task
+# ========================
+# Configure truncate_dim, max_length (for texts), max_pixels (for images), vector_type, batch_size in the encode function if needed
+# Encode query
+query_embedding = model.encode_texts(
+    texts=["Overview of climate change impacts on coastal cities"],
+    task="retrieval",
+    prompt_name="query",
+)[0]
+# Encode passage (text)
+passage_embedding = model.encode_texts(
+    texts=[
+        "Climate change has led to rising sea levels, increased frequency of extreme weather events..."
+    ],
+    task="retrieval",
+    prompt_name="passage",
+)[0]
+# Encode image/document
+image_embedding = model.encode_images(
+    images=["https://i.ibb.co/nQNGqL0/beach1.jpg"],
+    task="retrieval",
+)[0]
+# ========================
+# 2. Text Matching Task
+# ========================
+texts = [
+    "غروب جميل على الشاطئ",  # Arabic
+    "海滩上美丽的日落",  # Chinese
+    "Un beau coucher de soleil sur la plage",  # French
+    "Ein wunderschöner Sonnenuntergang am Strand",  # German
+    "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία",  # Greek
+    "समुद्र तट पर एक खूबसूरत सूर्यास्त",  # Hindi
+    "Un bellissimo tramonto sulla spiaggia",  # Italian
+    "浜辺に沈む美しい夕日",  # Japanese
+    "해변 위로 아름다운 일몰",  # Korean
+]
+text_embeddings = model.encode_texts(texts=texts, task="text-matching")
+# ========================
+# 3. Code Understanding Task
+# ========================
+# Encode query
+query_embedding = model.encode_texts(
+    texts=["Find a function that prints a greeting message to the console"],
+    task="code",
+    prompt_name="query",
+)
+# Encode code
+code_embeddings = model.encode_texts(
+    texts=["def hello_world():\n    print('Hello, World!')"],
+    task="code",
+    prompt_name="passage",
+)
+```
+</details>
+<details>
+  <summary>via <a href="https://sbert.net/">sentence-transformers</a></summary>
 ```python
 from sentence_transformers import SentenceTransformer
+# Initialize the model
+model = SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True)
+# ========================
+# 1. Retrieval Task
+# ========================
+# Encode query
+query_embedding = model.encode(
+    sentences=["Overview of climate change impacts on coastal cities"],
+    task="retrieval",
+    prompt_name="query",
+)[0]
+# Encode passage (text)
+passage_embedding = model.encode(
+    sentences=[
+        "Climate change has led to rising sea levels, increased frequency of extreme weather events..."
+    ],
+    task="retrieval",
+    prompt_name="passage",
+)[0]
+# Encode image/document
+image_embedding = model.encode(
+    sentences=["https://i.ibb.co/nQNGqL0/beach1.jpg"],
+    task="retrieval",
+)[0]
+# ========================
+# 2. Text Matching Task
+# ========================
+texts = [
+    "غروب جميل على الشاطئ",  # Arabic
+    "海滩上美丽的日落",  # Chinese
+    "Un beau coucher de soleil sur la plage",  # French
+    "Ein wunderschöner Sonnenuntergang am Strand",  # German
+    "Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία",  # Greek
+    "समुद्र तट पर एक खूबसूरत सूर्यास्त",  # Hindi
+    "Un bellissimo tramonto sulla spiaggia",  # Italian
+    "浜辺に沈む美しい夕日",  # Japanese
+    "해변 위로 아름다운 일몰",  # Korean
+]
+text_embeddings = model.encode(sentences=texts, task="text-matching")
+# ========================
+# 3. Code Understanding Task
+# ========================
+# Encode query
+query_embedding = model.encode(
+    sentences=["Find a function that prints a greeting message to the console"],
+    task="code",
+    prompt_name="query",
 )
+# Encode code
+code_embeddings = model.encode(
+    sentences=["def hello_world():\n    print('Hello, World!')"],
+    task="code",
+    prompt_name="passage",
+)
+```
+</details>
+## License
+This model is licensed to download and run under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en). It is available for commercial use via the [Jina Embeddings API](https://jina.ai/embeddings/), [AWS](https://longdogechallenge.com/), [Azure](https://longdogechallenge.com/), and [GCP](https://longdogechallenge.com/). To download for commercial use, please [contact us](https://jina.ai/contact-sales).
+## Contact
+Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
+## Citation
+If you find `jina-embeddings-v4` useful in your research, please cite the following paper: