Token Classification
Transformers
Safetensors
xlm-roberta
File size: 1,287 Bytes
4d3e8a6
545bf61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d3e8a6
 
545bf61
4d3e8a6
545bf61
4d3e8a6
545bf61
4d3e8a6
545bf61
 
 
 
 
 
4d3e8a6
545bf61
 
 
4d3e8a6
 
545bf61
4d3e8a6
545bf61
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: mit
datasets:
- CohereForAI/aya_collection_language_split
metrics:
- f1
- recall
- precision
language:
- te
- kn
- gu
- mr
- ml
- bn
- pa
- ta
library_name: transformers
---
This is based on [Kredor's work](https://huggingface.co/kredor/punctuate-all). But the languages are: Telugu, Tamil, Malayalam, Kannada, Gujarathi, Panjabi, Marathi and Bengali.

----- report -----

              precision    recall  f1-score   support

           0       0.99      0.99      0.99  18156530
           .       0.95      0.95      0.95    987478
           ,       0.82      0.79      0.80   1064002
           ?       0.97      0.96      0.97    316902
           -       0.94      0.86      0.90    226991
           :       0.94      0.96      0.95    262314

    accuracy                           0.97  21014217
   macro avg       0.93      0.92      0.93  21014217
weighted avg       0.97      0.97      0.97  21014217


----- confusion matrix -----

     t/p      0     .     ,     ?     -     : 
        0   1.0   0.0   0.0   0.0   0.0   0.0 
        .   0.0   1.0   0.0   0.0   0.0   0.0 
        ,   0.2   0.0   0.8   0.0   0.0   0.0 
        ?   0.0   0.0   0.0   1.0   0.0   0.0 
        -   0.1   0.0   0.0   0.0   0.9   0.0 
        :   0.0   0.0   0.0   0.0   0.0   1.0