|
--- |
|
license: mit |
|
datasets: |
|
- CohereForAI/aya_collection_language_split |
|
metrics: |
|
- f1 |
|
- recall |
|
- precision |
|
language: |
|
- te |
|
- kn |
|
- gu |
|
- mr |
|
- ml |
|
- bn |
|
- pa |
|
- ta |
|
library_name: transformers |
|
--- |
|
This is based on [Kredor's work](https://huggingface.co/kredor/punctuate-all). But the languages are: Telugu, Tamil, Malayalam, Kannada, Gujarathi, Panjabi, Marathi and Bengali. |
|
|
|
----- report ----- |
|
|
|
precision recall f1-score support |
|
|
|
0 0.99 0.99 0.99 18156530 |
|
. 0.95 0.95 0.95 987478 |
|
, 0.82 0.79 0.80 1064002 |
|
? 0.97 0.96 0.97 316902 |
|
- 0.94 0.86 0.90 226991 |
|
: 0.94 0.96 0.95 262314 |
|
|
|
accuracy 0.97 21014217 |
|
macro avg 0.93 0.92 0.93 21014217 |
|
weighted avg 0.97 0.97 0.97 21014217 |
|
|
|
|
|
----- confusion matrix ----- |
|
|
|
t/p 0 . , ? - : |
|
0 1.0 0.0 0.0 0.0 0.0 0.0 |
|
. 0.0 1.0 0.0 0.0 0.0 0.0 |
|
, 0.2 0.0 0.8 0.0 0.0 0.0 |
|
? 0.0 0.0 0.0 1.0 0.0 0.0 |
|
- 0.1 0.0 0.0 0.0 0.9 0.0 |
|
: 0.0 0.0 0.0 0.0 0.0 1.0 |