--- language: - ru license: cc-by-4.0 library_name: transformers tags: - text-classification - vulnerability - severity - cybersecurity - fstec - generated_from_trainer datasets: - CIRCL/Vulnerability-FSTEC base_model: ai-forever/ruRoberta-large pipeline_tag: text-classification --- # VLAI: Automated Vulnerability Severity Classification (Russian Text) A fine-tuned [ai-forever/ruRoberta-large](https://huggingface.co/ai-forever/ruRoberta-large) model for classifying Russian vulnerability descriptions from the [FSTEC](https://vulnerability.circl.lu/recent#fstec). Trained on the [CIRCL/Vulnerability-FSTEC](https://huggingface.co/datasets/CIRCL/Vulnerability-FSTEC) dataset as part of the [VulnTrain](https://github.com/vulnerability-lookup/VulnTrain) project. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 64 - eval_batch_size: 64 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 5 It achieves the following results on the evaluation set: - Loss: 2.6495 - Accuracy: 0.7417 - F1 Macro: 0.6650 - Low Precision: 0.6154 - Low Recall: 0.3380 - Low F1: 0.4364 - Medium Precision: 0.7619 - Medium Recall: 0.8312 - Medium F1: 0.7951 - High Precision: 0.6869 - High Recall: 0.6080 - High F1: 0.6450 - Critical Precision: 0.7678 - Critical Recall: 0.7996 - Critical F1: 0.7834 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Macro | Low Precision | Low Recall | Low F1 | Medium Precision | Medium Recall | Medium F1 | High Precision | High Recall | High F1 | Critical Precision | Critical Recall | Critical F1 | |:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-------------:|:----------:|:------:|:----------------:|:-------------:|:---------:|:--------------:|:-----------:|:-------:|:------------------:|:---------------:|:-----------:| | 3.0373 | 1.0 | 1167 | 3.0503 | 0.6895 | 0.5626 | 0.7959 | 0.1099 | 0.1931 | 0.7233 | 0.7958 | 0.7578 | 0.6083 | 0.5152 | 0.5579 | 0.6947 | 0.7954 | 0.7416 | | 2.9084 | 2.0 | 2334 | 2.8601 | 0.7142 | 0.6048 | 0.8 | 0.1803 | 0.2943 | 0.7523 | 0.8001 | 0.7754 | 0.6923 | 0.5156 | 0.5910 | 0.6660 | 0.8807 | 0.7584 | | 2.5937 | 3.0 | 3501 | 2.6529 | 0.7335 | 0.6349 | 0.6967 | 0.2394 | 0.3564 | 0.7565 | 0.8379 | 0.7952 | 0.7126 | 0.5411 | 0.6152 | 0.7092 | 0.8488 | 0.7727 | | 2.5230 | 4.0 | 4668 | 2.6348 | 0.7365 | 0.6549 | 0.6170 | 0.3268 | 0.4273 | 0.7403 | 0.8568 | 0.7943 | 0.7208 | 0.5451 | 0.6207 | 0.7526 | 0.8038 | 0.7773 | | 2.0599 | 5.0 | 5835 | 2.6495 | 0.7417 | 0.6650 | 0.6154 | 0.3380 | 0.4364 | 0.7619 | 0.8312 | 0.7951 | 0.6869 | 0.6080 | 0.6450 | 0.7678 | 0.7996 | 0.7834 | ### Framework versions - Transformers 5.5.0 - Pytorch 2.11.0+cu130 - Datasets 4.8.4 - Tokenizers 0.22.2