Ihor commited on
Commit
c4bc283
·
verified ·
1 Parent(s): 27e8d9a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - microsoft/deberta-v3-large
7
+ - HuggingFaceTB/SmolLM2-135M-Instruct
8
+ pipeline_tag: token-classification
9
+ tags:
10
+ - NER
11
+ - encoder
12
+ - decoder
13
+ - GLiNER
14
+ - information-extraction
15
+ ---
16
+
17
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/V5nB1X_qdyTtyTUZHYYHk.png)
18
+
19
+ **GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner.
20
+ This architecture combines:
21
+
22
+ * An **encoder** for representing entity spans
23
+ * A **decoder** for generating label names
24
+
25
+ This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities.
26
+ By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed.
27
+
28
+ ---
29
+
30
+ ## Key Features
31
+
32
+ * **Open ontology**: Works when the label set is unknown
33
+ * **Multi-label entity recognition**: Assign multiple labels to a single entity
34
+ * **Entity linking**: Handle large label sets via constrained generation
35
+ * **Knowledge expansion**: Gain from large decoder models
36
+ * **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER
37
+
38
+ ---
39
+
40
+ ## Installation
41
+
42
+ Update to the latest version of GLiNER:
43
+
44
+ ```bash
45
+ pip install -U gliner
46
+ ```
47
+
48
+ ---
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ from gliner import GLiNER
54
+
55
+ model = GLiNER.from_pretrained("gliner-decoder-large-v1.0")
56
+
57
+ text = (
58
+ "Apple was founded as Apple Computer Company on April 1, 1976, "
59
+ "by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
60
+ "develop and sell Wozniak's Apple I personal computer."
61
+ )
62
+
63
+ labels = ["person", "other"]
64
+
65
+ model.run(text, labels, threshold=0.3, num_gen_sequences=1)
66
+ ```
67
+
68
+ ---
69
+
70
+ ### Example Output
71
+
72
+ ```json
73
+ [
74
+ [
75
+ {
76
+ "start": 21,
77
+ "end": 26,
78
+ "text": "Apple",
79
+ "label": "other",
80
+ "score": 0.6795641779899597,
81
+ "generated labels": ["Organization"]
82
+ },
83
+ {
84
+ "start": 47,
85
+ "end": 60,
86
+ "text": "April 1, 1976",
87
+ "label": "other",
88
+ "score": 0.44296327233314514,
89
+ "generated labels": ["Date"]
90
+ },
91
+ {
92
+ "start": 65,
93
+ "end": 78,
94
+ "text": "Steve Wozniak",
95
+ "label": "person",
96
+ "score": 0.9934439659118652,
97
+ "generated labels": ["Person"]
98
+ },
99
+ {
100
+ "start": 80,
101
+ "end": 90,
102
+ "text": "Steve Jobs",
103
+ "label": "person",
104
+ "score": 0.9725918769836426,
105
+ "generated labels": ["Person"]
106
+ },
107
+ {
108
+ "start": 107,
109
+ "end": 119,
110
+ "text": "Ronald Wayne",
111
+ "label": "person",
112
+ "score": 0.9964536428451538,
113
+ "generated labels": ["Person"]
114
+ }
115
+ ]
116
+ ]
117
+ ```
118
+
119
+ ---
120
+
121
+ ### Restricting the Decoder
122
+
123
+ You can limit the decoder to generate labels only from a predefined set:
124
+
125
+ ```python
126
+ model.run(
127
+ text, labels,
128
+ threshold=0.3,
129
+ num_gen_sequences=1,
130
+ gen_constraints=[
131
+ "organization", "organization type", "city",
132
+ "technology", "date", "person"
133
+ ]
134
+ )
135
+ ```
136
+
137
+ ---
138
+
139
+ ## Performance Tips
140
+
141
+ Two label trie implementations are available.
142
+ For a **faster, memory-efficient C++ version**, install **Cython**:
143
+
144
+ ```bash
145
+ pip install cython
146
+ ```
147
+
148
+ This can significantly improve performance and reduce memory usage, especially with millions of labels.