|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: token-classification |
|
|
inference: false |
|
|
tags: |
|
|
- token-classification |
|
|
- entity-recognition |
|
|
- generic |
|
|
- feature-extraction |
|
|
- foundation-model |
|
|
--- |
|
|
|
|
|
# SOTA Entity Recognition V1 foundation model by NuMind 🔥 |
|
|
|
|
|
This model provides the best embedding for the Entity Recognition task. |
|
|
|
|
|
**Checkout other models by NuMind:** |
|
|
* SOTA multilingual Entity Recognition foundation model: [link](https://huggingface.co/numind/entity-recognition-multilingual-general-sota-v1) |
|
|
* SOTA Sentiment Analysis foundation model: [English](https://huggingface.co/numind/generic-sentiment-v1), [Multilingual](https://huggingface.co/numind/generic-sentiment-multi-v1) |
|
|
|
|
|
## About |
|
|
|
|
|
[Roberta-base](https://huggingface.co/roberta-base) fine-tuned on an artificially annotated subset of [C4](https://huggingface.co/datasets/c4). |
|
|
|
|
|
**Results:** |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
Embeddings can be used out of the box or fine-tuned on specific datasets. |
|
|
|
|
|
Get embeddings: |
|
|
|
|
|
|
|
|
```python |
|
|
import torch |
|
|
import transformers |
|
|
|
|
|
|
|
|
model = transformers.AutoModel.from_pretrained( |
|
|
'numind/entity-recognition-general-sota-v1', |
|
|
output_hidden_states=True |
|
|
) |
|
|
tokenizer = transformers.AutoTokenizer.from_pretrained( |
|
|
'numind/entity-recognition-general-sota-v1' |
|
|
) |
|
|
|
|
|
text = [ |
|
|
"NuMind is an AI company based in Paris and USA.", |
|
|
"See other models from us on https://huggingface.co/numind" |
|
|
] |
|
|
encoded_input = tokenizer( |
|
|
text, |
|
|
return_tensors='pt', |
|
|
padding=True, |
|
|
truncation=True |
|
|
) |
|
|
output = model(**encoded_input) |
|
|
|
|
|
# for better quality |
|
|
emb = torch.cat( |
|
|
(output.hidden_states[-1], output.hidden_states[-7]), |
|
|
dim=2 |
|
|
) |
|
|
|
|
|
# for better speed |
|
|
# emb = output.hidden_states[-1] |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
Sergei Bogdanov: [email protected] |