Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,148 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
base_model:
|
6 |
+
- microsoft/deberta-v3-large
|
7 |
+
- HuggingFaceTB/SmolLM2-135M-Instruct
|
8 |
+
pipeline_tag: token-classification
|
9 |
+
tags:
|
10 |
+
- NER
|
11 |
+
- encoder
|
12 |
+
- decoder
|
13 |
+
- GLiNER
|
14 |
+
- information-extraction
|
15 |
+
---
|
16 |
+
|
17 |
+

|
18 |
+
|
19 |
+
**GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner.
|
20 |
+
This architecture combines:
|
21 |
+
|
22 |
+
* An **encoder** for representing entity spans
|
23 |
+
* A **decoder** for generating label names
|
24 |
+
|
25 |
+
This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities.
|
26 |
+
By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed.
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
## Key Features
|
31 |
+
|
32 |
+
* **Open ontology**: Works when the label set is unknown
|
33 |
+
* **Multi-label entity recognition**: Assign multiple labels to a single entity
|
34 |
+
* **Entity linking**: Handle large label sets via constrained generation
|
35 |
+
* **Knowledge expansion**: Gain from large decoder models
|
36 |
+
* **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER
|
37 |
+
|
38 |
+
---
|
39 |
+
|
40 |
+
## Installation
|
41 |
+
|
42 |
+
Update to the latest version of GLiNER:
|
43 |
+
|
44 |
+
```bash
|
45 |
+
pip install -U gliner
|
46 |
+
```
|
47 |
+
|
48 |
+
---
|
49 |
+
|
50 |
+
## Usage
|
51 |
+
|
52 |
+
```python
|
53 |
+
from gliner import GLiNER
|
54 |
+
|
55 |
+
model = GLiNER.from_pretrained("gliner-decoder-large-v1.0")
|
56 |
+
|
57 |
+
text = (
|
58 |
+
"Apple was founded as Apple Computer Company on April 1, 1976, "
|
59 |
+
"by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
|
60 |
+
"develop and sell Wozniak's Apple I personal computer."
|
61 |
+
)
|
62 |
+
|
63 |
+
labels = ["person", "other"]
|
64 |
+
|
65 |
+
model.run(text, labels, threshold=0.3, num_gen_sequences=1)
|
66 |
+
```
|
67 |
+
|
68 |
+
---
|
69 |
+
|
70 |
+
### Example Output
|
71 |
+
|
72 |
+
```json
|
73 |
+
[
|
74 |
+
[
|
75 |
+
{
|
76 |
+
"start": 21,
|
77 |
+
"end": 26,
|
78 |
+
"text": "Apple",
|
79 |
+
"label": "other",
|
80 |
+
"score": 0.6795641779899597,
|
81 |
+
"generated labels": ["Organization"]
|
82 |
+
},
|
83 |
+
{
|
84 |
+
"start": 47,
|
85 |
+
"end": 60,
|
86 |
+
"text": "April 1, 1976",
|
87 |
+
"label": "other",
|
88 |
+
"score": 0.44296327233314514,
|
89 |
+
"generated labels": ["Date"]
|
90 |
+
},
|
91 |
+
{
|
92 |
+
"start": 65,
|
93 |
+
"end": 78,
|
94 |
+
"text": "Steve Wozniak",
|
95 |
+
"label": "person",
|
96 |
+
"score": 0.9934439659118652,
|
97 |
+
"generated labels": ["Person"]
|
98 |
+
},
|
99 |
+
{
|
100 |
+
"start": 80,
|
101 |
+
"end": 90,
|
102 |
+
"text": "Steve Jobs",
|
103 |
+
"label": "person",
|
104 |
+
"score": 0.9725918769836426,
|
105 |
+
"generated labels": ["Person"]
|
106 |
+
},
|
107 |
+
{
|
108 |
+
"start": 107,
|
109 |
+
"end": 119,
|
110 |
+
"text": "Ronald Wayne",
|
111 |
+
"label": "person",
|
112 |
+
"score": 0.9964536428451538,
|
113 |
+
"generated labels": ["Person"]
|
114 |
+
}
|
115 |
+
]
|
116 |
+
]
|
117 |
+
```
|
118 |
+
|
119 |
+
---
|
120 |
+
|
121 |
+
### Restricting the Decoder
|
122 |
+
|
123 |
+
You can limit the decoder to generate labels only from a predefined set:
|
124 |
+
|
125 |
+
```python
|
126 |
+
model.run(
|
127 |
+
text, labels,
|
128 |
+
threshold=0.3,
|
129 |
+
num_gen_sequences=1,
|
130 |
+
gen_constraints=[
|
131 |
+
"organization", "organization type", "city",
|
132 |
+
"technology", "date", "person"
|
133 |
+
]
|
134 |
+
)
|
135 |
+
```
|
136 |
+
|
137 |
+
---
|
138 |
+
|
139 |
+
## Performance Tips
|
140 |
+
|
141 |
+
Two label trie implementations are available.
|
142 |
+
For a **faster, memory-efficient C++ version**, install **Cython**:
|
143 |
+
|
144 |
+
```bash
|
145 |
+
pip install cython
|
146 |
+
```
|
147 |
+
|
148 |
+
This can significantly improve performance and reduce memory usage, especially with millions of labels.
|