AraStyleTransfer-21 | 21 Arabic Author Styles. One Model.
🏆 First Place Winner at AraGenEval 2025 Competition
A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.
🔗 Paper Link (ACL Anthology)
📘 ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf]
🏗️ Model Architecture
- Base Model: UBC-NLP/AraT5v2-base-1024
- Approach: Descriptive Author Tokens + Prompt Engineering
- Input Format:
"اكتب النص التالي بأسلوب <author:name>: [text]" - Training: Fine-tuned with author-specific tokens
🔬 Technical Details
Stylometric Analysis
The model incorporates comprehensive stylometric analysis including:
- Lexical Features: Sentence length, word length, vocabulary richness
- Syntactic Patterns: Definite articles, conjunctions, prepositions
- Author-Specific Vocabulary: TF-IDF based characteristic words
- Style Classification: Formality, complexity, emotional intensity
Prompt Engineering
- Format:
"اكتب النص التالي بأسلوب <author:يوسف_إدريس>: [original_text]" - Author Tokens: Descriptive tokens like
<author:يوسف_إدريس> - Target: Generated text in author's style
📚 Supported Authors
📁 Input File Format
For batch processing, your input file should have the following format:
📊 Example Snippets from the Dataset
| id | text_in_msa (partial) | text_in_author_style (partial) |
|---|---|---|
| 3835 | "لم أقم مطلقًا بالاحتفال بعيد ميلادي... وكنت أتجادل مع كامل الشناوي..." | "عمري ما احتفلت بعيد ميلادي... وأتشاجر مع كامل الشناوي على ذلك الاكتئاب..." |
| 3836 | "الزمن العام هو العداد الجماعي الذي يسجل السنين... ويبرز الزمن الخاص..." | "الزمن العام يعدّ السنين للناس كلها... أما عدادك الخاص فأنت نادرًا ما تنظر فيه..." |
| 3837 | "مصر الغنية الراقية... اشتراكية وديمقراطية تتفاعل معًا... أحلام الخمسين..." | "مصر المصنِّعة... الكون مائة زهرة... وحين أبلغ الخمسين أبدأ أعيش وأتعلم الموسيقى..." |
| 3838 | "غرابة التجربة... طفولة جادة تمامًا بلا مرح... الطفولة كانت عيبًا..." | "غريبة هي الأفكار... كنتُ رجلًا رهيبًا في ثوب طفل... والطفولة تُهمة نخشى الاعتراف بها..." |
| 3839 | "هذا ليس ندمًا... موجة تفوقك قوة... النصر الحقيقي أن تعيش كما تختار..." | "ليس مرارة ولا ندمًا... أنت تناضل موجة أعتى منك... والحق أن تحيا كما اخترت أنت..." |
📊 Performance Metrics
- BLEU Score: 24.58
- chrF Score: 59.01
- Competition: First Place in AraGenEval 2024
- Supported Authors: 21 Arabic authors
Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first.
🚀 Quick Start: Style Transfer Example
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
# Load model
model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Input text and author
text = "لم أقم مطلقًا بالاحتفال بعيد ميلادي منذ طفولتي."
author = "يوسف إدريس"
# Prompt format
prompt = f"اكتب النص التالي بأسلوب <author:{author.replace(' ', '_')}>: {text}"
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate
output_ids = model.generate(
**inputs,
max_length=256,
num_beams=5,
early_stopping=True
)
# Decode
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Original:", text)
print("Author:", author)
print("Output:", generated_text)
🎯 Use Cases
- Content Creation: Generate text in specific author styles
- Educational Tools: Demonstrate different writing styles
- Research: Study Arabic literary styles and patterns
- Creative Writing: Inspire new content in classic styles
🤝 Contributing
This model was developed for the AraGenEval 2025 competition. For questions or contributions, please refer to the competition guidelines.
📄 License
This model is released under the same license as the base AraT5v2 model.
BibTeX Citation
@inproceedings{nacar2025anlpers,
title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer},
author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii},
booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
pages={49--53},
year={2025}
}
🏆 First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition
- Downloads last month
- 12
Model tree for Omartificial-Intelligence-Space/AraStyleTransfer-21
Base model
UBC-NLP/AraT5v2-base-1024