Ihor commited on
Commit
b125efe
·
verified ·
1 Parent(s): b2e0c3c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -0
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - microsoft/deberta-v3-base
7
+ - HuggingFaceTB/SmolLM2-135M-Instruct
8
+ pipeline_tag: token-classification
9
+ tags:
10
+ - NER
11
+ - encoder
12
+ - decoder
13
+ - GLiNER
14
+ - information-extraction
15
+ ---
16
+
17
+ <!-- ![gliner-decoder](image.png)
18
+ -->
19
+
20
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/V5nB1X_qdyTtyTUZHYYHk.png)
21
+
22
+ **GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner.
23
+ This architecture combines:
24
+
25
+ * An **encoder** for representing entity spans
26
+ * A **decoder** for generating label names
27
+
28
+ This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities.
29
+ By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed.
30
+
31
+ ---
32
+
33
+ ## Key Features
34
+
35
+ * **Open ontology**: Works when the label set is unknown
36
+ * **Multi-label entity recognition**: Assign multiple labels to a single entity
37
+ * **Entity linking**: Handle large label sets via constrained generation
38
+ * **Knowledge expansion**: Gain from large decoder models
39
+ * **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER
40
+
41
+ ---
42
+
43
+ ## Installation
44
+
45
+ Update to the latest version of GLiNER:
46
+
47
+ ```bash
48
+ pip install -U gliner
49
+ ```
50
+
51
+ ---
52
+
53
+ ## Usage
54
+
55
+ ```python
56
+ from gliner import GLiNER
57
+
58
+ model = GLiNER.from_pretrained("gliner-decoder-base-v1.0")
59
+
60
+ text = (
61
+ "Apple was founded as Apple Computer Company on April 1, 1976, "
62
+ "by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
63
+ "develop and sell Wozniak's Apple I personal computer."
64
+ )
65
+
66
+ labels = ["person", "other"]
67
+
68
+ model.run(text, labels, threshold=0.3, num_gen_sequences=1)
69
+ ```
70
+
71
+ ---
72
+
73
+ ### Example Output
74
+
75
+ ```json
76
+ [
77
+ [
78
+ {
79
+ "start": 21,
80
+ "end": 26,
81
+ "text": "Apple",
82
+ "label": "other",
83
+ "score": 0.6795641779899597,
84
+ "generated labels": ["Organization"]
85
+ },
86
+ {
87
+ "start": 47,
88
+ "end": 60,
89
+ "text": "April 1, 1976",
90
+ "label": "other",
91
+ "score": 0.44296327233314514,
92
+ "generated labels": ["Date"]
93
+ },
94
+ {
95
+ "start": 65,
96
+ "end": 78,
97
+ "text": "Steve Wozniak",
98
+ "label": "person",
99
+ "score": 0.9934439659118652,
100
+ "generated labels": ["Person"]
101
+ },
102
+ {
103
+ "start": 80,
104
+ "end": 90,
105
+ "text": "Steve Jobs",
106
+ "label": "person",
107
+ "score": 0.9725918769836426,
108
+ "generated labels": ["Person"]
109
+ },
110
+ {
111
+ "start": 107,
112
+ "end": 119,
113
+ "text": "Ronald Wayne",
114
+ "label": "person",
115
+ "score": 0.9964536428451538,
116
+ "generated labels": ["Person"]
117
+ }
118
+ ]
119
+ ]
120
+ ```
121
+
122
+ ---
123
+
124
+ ### Restricting the Decoder
125
+
126
+ You can limit the decoder to generate labels only from a predefined set:
127
+
128
+ ```python
129
+ model.run(
130
+ text, labels,
131
+ threshold=0.3,
132
+ num_gen_sequences=1,
133
+ gen_constraints=[
134
+ "organization", "organization type", "city",
135
+ "technology", "date", "person"
136
+ ]
137
+ )
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Performance Tips
143
+
144
+ Two label trie implementations are available.
145
+ For a **faster, memory-efficient C++ version**, install **Cython**:
146
+
147
+ ```bash
148
+ pip install cython
149
+ ```
150
+
151
+ This can significantly improve performance and reduce memory usage, especially with millions of labels.