URL-TITLE-classifier-preview
Model Overview
This is a preview version of a multi-label web classification model fine-tuned from Alibaba-NLP/gte-modernbert-base. It classifies websites into multiple categories based on their URLs and titles.
The model supports 11 labels:Uncategorized, News, Entertainment, Shop, Chat, Education, Government, Health, Technology, Work, and Travel.
- Developed by: Taimur Hasan
- Model Type: Multi-label Text Classification
- Status: Preview (under active development)
Architecture
- Fine-tuning Strategy: Unfroze the last 4 encoder layers and the pooler
- Problem Type: Multi-label classification
- Output Labels:
News,Entertainment,Shop,Chat,Education,Government,Health,Technology,Work,Travel,Uncategorized
- Input Format: Concatenated string:
"{url}:{title}"
Evaluation Metrics (Validation Data)
| Metric | Value |
|---|---|
| Loss | 0.207 |
| Hamming Loss | 0.083 |
| Exact Match | 0.445 |
| Precision (Micro) | 0.917 |
| Recall (Micro) | 0.917 |
| F1 Score (Micro) | 0.917 |
| Precision (Macro) | 0.795 |
| Recall (Macro) | 0.598 |
| F1 Score (Macro) | 0.677 |
| Precision (Weighted) | 0.798 |
| Recall (Weighted) | 0.647 |
| F1 Score (Weighted) | 0.711 |
| ROC AUC (Micro) | 0.941 |
| ROC AUC (Macro) | 0.928 |
| PR AUC (Micro) | 0.815 |
| PR AUC (Macro) | 0.765 |
| Jaccard (Micro) | 0.848 |
| Jaccard (Macro) | 0.520 |
Per-Label F1 Scores
| Label | F1 Score |
|---|---|
| News | 0.605 |
| Entertainment | 0.764 |
| Shop | 0.704 |
| Chat | 0.875 |
| Education | 0.763 |
| Government | 0.667 |
| Health | 0.574 |
| Technology | 0.738 |
| Work | 0.527 |
| Travel | 0.571 |
| Uncategorized | 0.657 |
Note: This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome.
- Downloads last month
- 12
Model tree for firefoxrecap/URL-TITLE-classifier
Base model
answerdotai/ModernBERT-base
Finetuned
Alibaba-NLP/gte-modernbert-base