knowledgator
/

gliner-decoder-large-v1.0

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- microsoft/deberta-v3-large
+- HuggingFaceTB/SmolLM2-135M-Instruct
+pipeline_tag: token-classification
+tags:
+- NER
+- encoder
+- decoder
+- GLiNER
+- information-extraction
+---
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/V5nB1X_qdyTtyTUZHYYHk.png)
+**GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner.
+This architecture combines:
+* An **encoder** for representing entity spans
+* A **decoder** for generating label names
+This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities.
+By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed.
+---
+## Key Features
+* **Open ontology**: Works when the label set is unknown
+* **Multi-label entity recognition**: Assign multiple labels to a single entity
+* **Entity linking**: Handle large label sets via constrained generation
+* **Knowledge expansion**: Gain from large decoder models
+* **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER
+---
+## Installation
+Update to the latest version of GLiNER:
+```bash
+pip install -U gliner
+```
+---
+## Usage
+```python
+from gliner import GLiNER
+model = GLiNER.from_pretrained("gliner-decoder-large-v1.0")
+text = (
+    "Apple was founded as Apple Computer Company on April 1, 1976, "
+    "by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
+    "develop and sell Wozniak's Apple I personal computer."
+)
+labels = ["person", "other"]
+model.run(text, labels, threshold=0.3, num_gen_sequences=1)
+```
+---
+### Example Output
+```json
+[
+  [
+    {
+      "start": 21,
+      "end": 26,
+      "text": "Apple",
+      "label": "other",
+      "score": 0.6795641779899597,
+      "generated labels": ["Organization"]
+    },
+    {
+      "start": 47,
+      "end": 60,
+      "text": "April 1, 1976",
+      "label": "other",
+      "score": 0.44296327233314514,
+      "generated labels": ["Date"]
+    },
+    {
+      "start": 65,
+      "end": 78,
+      "text": "Steve Wozniak",
+      "label": "person",
+      "score": 0.9934439659118652,
+      "generated labels": ["Person"]
+    },
+    {
+      "start": 80,
+      "end": 90,
+      "text": "Steve Jobs",
+      "label": "person",
+      "score": 0.9725918769836426,
+      "generated labels": ["Person"]
+    },
+    {
+      "start": 107,
+      "end": 119,
+      "text": "Ronald Wayne",
+      "label": "person",
+      "score": 0.9964536428451538,
+      "generated labels": ["Person"]
+    }
+  ]
+]
+```
+---
+### Restricting the Decoder
+You can limit the decoder to generate labels only from a predefined set:
+```python
+model.run(
+    text, labels,
+    threshold=0.3,
+    num_gen_sequences=1,
+    gen_constraints=[
+        "organization", "organization type", "city",
+        "technology", "date", "person"
+    ]
+)
+```
+---
+## Performance Tips
+Two label trie implementations are available.
+For a **faster, memory-efficient C++ version**, install **Cython**:
+```bash
+pip install cython
+```
+This can significantly improve performance and reduce memory usage, especially with millions of labels.