File size: 1,970 Bytes
5e4ba1c 7e236b4 b4748e4 7e236b4 98a9445 4834a02 bdbde9b 5e4ba1c 7e236b4 26e48f5 98a9445 c0e2c5b 98a9445 c0e2c5b 03c14e5 98a9445 26e48f5 98a9445 726c902 26e48f5 726c902 783f900 5f866d6 726c902 7e236b4 b411d6a a8fabbc b411d6a 26a63b8 b411d6a 26a63b8 b411d6a 6d96181 b411d6a 0701685 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: mit
language:
- en
pipeline_tag: token-classification
inference: false
tags:
- token-classification
- entity-recognition
- foundation-model
- feature-extraction
- RoBERTa
- generic
datasets:
- numind/NuNER
---
# Entity Recognition English Foundation Model by NuMind 🔥
This model provides the best embedding for the Entity Recognition task in English.
**Checkout other models by NuMind:**
* SOTA Multilingual Entity Recognition Foundation Model: [link](https://huggingface.co/numind/entity-recognition-multilingual-general-sota-v1)
* SOTA Sentiment Analysis Foundation Model: [English](https://huggingface.co/numind/generic-sentiment-v1), [Multilingual](https://huggingface.co/numind/generic-sentiment-multi-v1)
## About
[Roberta-base](https://huggingface.co/roberta-base) fine-tuned on [NuNER data](https://huggingface.co/datasets/numind/NuNER).
**Metrics:**
Read more about evaluation protocol & datasets in our [paper](https://arxiv.org/abs/2402.15343) and [blog post](https://www.numind.ai/blog/a-foundation-model-for-entity-recognition).
| Model | F1 macro |
|----------|----------|
| RoBERTa-base | 0.7129 |
| ours | 0.7500 |
| ours + two emb | 0.7686 |
## Usage
Embeddings can be used out of the box or fine-tuned on specific datasets.
Get embeddings:
```python
import torch
import transformers
model = transformers.AutoModel.from_pretrained(
'numind/NuNER-v0.1',
output_hidden_states=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(
'numind/NuNER-v0.1'
)
text = [
"NuMind is an AI company based in Paris and USA.",
"See other models from us on https://huggingface.co/numind"
]
encoded_input = tokenizer(
text,
return_tensors='pt',
padding=True,
truncation=True
)
output = model(**encoded_input)
# for better quality
emb = torch.cat(
(output.hidden_states[-1], output.hidden_states[-7]),
dim=2
)
# for better speed
# emb = output.hidden_states[-1]
``` |