File size: 3,022 Bytes
2960d40 602c219 91a79bd 7d4e785 b0d6224 7d4e785 b0d6224 7d4e785 cfc0f6e c400a42 cfc0f6e c400a42 cfc0f6e c400a42 cfc0f6e b0d6224 7d4e785 79f5310 7d4e785 79f5310 7d4e785 c400a42 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
license: apache-2.0
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
tags:
- language detection
- German
- English
- French
- Spanish
- GEFS
- Language dectetor
datasets:
- papluca/language-identification
language:
- de
- en
- fr
- es
---
# German, English, French and Spanish Language Detector
The ImranzamanML/GEFS-language-detector is a fined tuned model by using the dataset of papluca [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) and the base model [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) .
This language detection model demonstrated exceptional performance, achieving an impressive F1 score close to 100%. This result significantly exceeds typical benchmarks and underscores the model's accuracy and reliability in identifying languages.
## Supported languages
Currently this model support 4 languages but in future more languages will be added.
Following languages supported by the model:
- german (de)
- english (en)
- spanish (es)
- french (fr)
# Use a pipeline as a high-level helper
```python
from transformers import pipeline
text=["Mir gefällt die Art und Weise, Sprachen zu erkennen",
"I like the way to detect languages",
"Me gusta la forma de detectar idiomas",
"J'aime la façon de détecter les langues"]
pipe = pipeline("text-classification", model="ImranzamanML/GEFS-language-detector")
lang_detect=pipe(text, top_k=1)
print("The detected language is", lang_detect)
```
# Load model directly
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ImranzamanML/GEFS-language-detector")
model = AutoModelForSequenceClassification.from_pretrained("ImranzamanML/GEFS-language-detector")
```
## Model Training
Epoch Training Loss Validation Loss
1 0.002600 0.000148
2 0.001000 0.000015
3 0.000000 0.000011
4 0.001800 0.000009
5 0.002700 0.000016
6 0.001600 0.000012
7 0.001300 0.000009
8 0.001200 0.000008
9 0.000900 0.000007
10 0.000900 0.000007
## Testing Results
Language Precision Recall F1 Accuracy
de 0.9997 0.9998 0.9998 0.9999
en 1.0000 1.0000 1.0000 1.0000
fr 0.9995 0.9996 0.9996 0.9996
es 0.9994 0.9996 0.9995 0.9996
## About Author
- **Name**: Muhammad Imran Zaman
- **Company**: [Theum AG](https://theum.com/en/index.htm?t=)
- **Professional Links**:
- Kaggle: [Profile](https://www.kaggle.com/muhammadimran112233)
- LinkedIn: [Profile](linkedin.com/in/muhammad-imran-zaman)
- Google Scholar: [Profile](https://scholar.google.com/citations?user=ulVFpy8AAAAJ&hl=en)
- YouTube: [Channel](https://www.youtube.com/@consolioo)
- GitHub: [Channel](https://github.com/Imran-ml)
- |