Update README.md
Browse files
README.md
CHANGED
|
@@ -8,64 +8,65 @@ license: other
|
|
| 8 |
# Model Overview
|
| 9 |
This is a multilingual text classification model that can enable data annotation, creation of domain-specific blends and the addition of metadata tags. The model classifies documents into one of 26 domain classes:
|
| 10 |
|
| 11 |
-
'Adult', 'Arts_and_Entertainment', 'Autos_and_Vehicles', 'Beauty_and_Fitness', 'Books_and_Literature', 'Business_and_Industrial', 'Computers_and_Electronics', 'Finance', 'Food_and_Drink', 'Games', 'Health', 'Hobbies_and_Leisure', 'Home_and_Garden', 'Internet_and_Telecom', 'Jobs_and_Education', 'Law_and_Government', 'News', 'Online_Communities', 'People_and_Society', 'Pets_and_Animals', 'Real_Estate', 'Science', 'Sensitive_Subjects', 'Shopping', 'Sports', 'Travel_and_Transportation'
|
| 12 |
-
|
| 13 |
-
It supports 52 languages (English and 51 other languages) : 'ar', 'az', 'bg', 'bn', 'ca', 'cs', 'da', 'de', 'el', 'es', 'et', 'fa', 'fi', 'fr', 'gl', 'he', 'hi', 'hr', 'hu', 'hy', 'id', 'is', 'it', 'ka', 'kk', 'kn', 'ko', 'lt', 'lv', 'mk', 'ml', 'mr', 'ne', 'nl', 'no', 'pl', 'pt', 'ro', 'ru', 'sk', 'sl', 'sq', 'sr', 'sv', 'ta', 'tr', 'uk', 'ur', 'vi', 'ja', 'zh'
|
| 14 |
```
|
| 15 |
-
|
| 16 |
-
ar Arabic
|
| 17 |
-
az Azerbaijani
|
| 18 |
-
bg Bulgarian
|
| 19 |
-
bn Bengali
|
| 20 |
-
ca Catalan
|
| 21 |
-
cs Czech
|
| 22 |
-
da Danish
|
| 23 |
-
de German
|
| 24 |
-
el Greek
|
| 25 |
-
es Spanish
|
| 26 |
-
et Estonian
|
| 27 |
-
fa Persian
|
| 28 |
-
fi Finnish
|
| 29 |
-
fr French
|
| 30 |
-
gl Galician
|
| 31 |
-
he Hebrew
|
| 32 |
-
hi Hindi
|
| 33 |
-
hr Croatian
|
| 34 |
-
hu Hungarian
|
| 35 |
-
hy Armenian
|
| 36 |
-
id Indonesian
|
| 37 |
-
is Icelandic
|
| 38 |
-
it Italian
|
| 39 |
-
ka Georgian
|
| 40 |
-
kk Kazakh
|
| 41 |
-
kn Kannada
|
| 42 |
-
ko Korean
|
| 43 |
-
lt Lithuanian
|
| 44 |
-
lv Latvian
|
| 45 |
-
mk Macedonian
|
| 46 |
-
ml Malayalam
|
| 47 |
-
mr Marathi
|
| 48 |
-
ne Nepali
|
| 49 |
-
nl Dutch
|
| 50 |
-
no Norwegian
|
| 51 |
-
pl Polish
|
| 52 |
-
pt Portuguese
|
| 53 |
-
ro Romanian
|
| 54 |
-
ru Russian
|
| 55 |
-
sk Slovak
|
| 56 |
-
sl Slovenian
|
| 57 |
-
sq Albanian
|
| 58 |
-
sr Serbian
|
| 59 |
-
sv Swedish
|
| 60 |
-
ta Tamil
|
| 61 |
-
tr Turkish
|
| 62 |
-
uk Ukrainian
|
| 63 |
-
ur Urdu
|
| 64 |
-
vi Vietnamese
|
| 65 |
-
ja Japanese
|
| 66 |
-
zh Chinese
|
| 67 |
```
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
# License
|
| 70 |
This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
|
| 71 |
|
|
@@ -126,6 +127,9 @@ Arts_and_Entertainment
|
|
| 126 |
## Evaluation
|
| 127 |
- Metric: PR-AUC
|
| 128 |
|
|
|
|
|
|
|
|
|
|
| 129 |
# Inference
|
| 130 |
- Engine: PyTorch
|
| 131 |
- Test Hardware: V100
|
|
|
|
| 8 |
# Model Overview
|
| 9 |
This is a multilingual text classification model that can enable data annotation, creation of domain-specific blends and the addition of metadata tags. The model classifies documents into one of 26 domain classes:
|
| 10 |
|
|
|
|
|
|
|
|
|
|
| 11 |
```
|
| 12 |
+
'Adult', 'Arts_and_Entertainment', 'Autos_and_Vehicles', 'Beauty_and_Fitness', 'Books_and_Literature', 'Business_and_Industrial', 'Computers_and_Electronics', 'Finance', 'Food_and_Drink', 'Games', 'Health', 'Hobbies_and_Leisure', 'Home_and_Garden', 'Internet_and_Telecom', 'Jobs_and_Education', 'Law_and_Government', 'News', 'Online_Communities', 'People_and_Society', 'Pets_and_Animals', 'Real_Estate', 'Science', 'Sensitive_Subjects', 'Shopping', 'Sports', 'Travel_and_Transportation'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
```
|
| 14 |
|
| 15 |
+
It supports 52 languages (English and 51 other languages):
|
| 16 |
+
| Code | Language Name |
|
| 17 |
+
|------|----------------|
|
| 18 |
+
| ar | Arabic |
|
| 19 |
+
| az | Azerbaijani |
|
| 20 |
+
| bg | Bulgarian |
|
| 21 |
+
| bn | Bengali |
|
| 22 |
+
| ca | Catalan |
|
| 23 |
+
| cs | Czech |
|
| 24 |
+
| da | Danish |
|
| 25 |
+
| de | German |
|
| 26 |
+
| el | Greek |
|
| 27 |
+
| es | Spanish |
|
| 28 |
+
| et | Estonian |
|
| 29 |
+
| fa | Persian |
|
| 30 |
+
| fi | Finnish |
|
| 31 |
+
| fr | French |
|
| 32 |
+
| gl | Galician |
|
| 33 |
+
| he | Hebrew |
|
| 34 |
+
| hi | Hindi |
|
| 35 |
+
| hr | Croatian |
|
| 36 |
+
| hu | Hungarian |
|
| 37 |
+
| hy | Armenian |
|
| 38 |
+
| id | Indonesian |
|
| 39 |
+
| is | Icelandic |
|
| 40 |
+
| it | Italian |
|
| 41 |
+
| ka | Georgian |
|
| 42 |
+
| kk | Kazakh |
|
| 43 |
+
| kn | Kannada |
|
| 44 |
+
| ko | Korean |
|
| 45 |
+
| lt | Lithuanian |
|
| 46 |
+
| lv | Latvian |
|
| 47 |
+
| mk | Macedonian |
|
| 48 |
+
| ml | Malayalam |
|
| 49 |
+
| mr | Marathi |
|
| 50 |
+
| ne | Nepali |
|
| 51 |
+
| nl | Dutch |
|
| 52 |
+
| no | Norwegian |
|
| 53 |
+
| pl | Polish |
|
| 54 |
+
| pt | Portuguese |
|
| 55 |
+
| ro | Romanian |
|
| 56 |
+
| ru | Russian |
|
| 57 |
+
| sk | Slovak |
|
| 58 |
+
| sl | Slovenian |
|
| 59 |
+
| sq | Albanian |
|
| 60 |
+
| sr | Serbian |
|
| 61 |
+
| sv | Swedish |
|
| 62 |
+
| ta | Tamil |
|
| 63 |
+
| tr | Turkish |
|
| 64 |
+
| uk | Ukrainian |
|
| 65 |
+
| ur | Urdu |
|
| 66 |
+
| vi | Vietnamese |
|
| 67 |
+
| ja | Japanese |
|
| 68 |
+
| zh | Chinese |
|
| 69 |
+
|
| 70 |
# License
|
| 71 |
This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
|
| 72 |
|
|
|
|
| 127 |
## Evaluation
|
| 128 |
- Metric: PR-AUC
|
| 129 |
|
| 130 |
+
PR-AUC by language:
|
| 131 |
+
<img src="https://huggingface.co/nvidia/multilingual-domain-classifier/resolve/main/pr_auc_by_language.PNG" alt="pr_auc_by_language" style="width:750px;">
|
| 132 |
+
|
| 133 |
# Inference
|
| 134 |
- Engine: PyTorch
|
| 135 |
- Test Hardware: V100
|