AraStyleTransfer-21 | 21 Arabic Author Styles. One Model.

🏆 First Place Winner at AraGenEval 2025 Competition

A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.

🔗 Paper Link (ACL Anthology)

📘 ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf]

🏗️ Model Architecture

  • Base Model: UBC-NLP/AraT5v2-base-1024
  • Approach: Descriptive Author Tokens + Prompt Engineering
  • Input Format: "اكتب النص التالي بأسلوب <author:name>: [text]"
  • Training: Fine-tuned with author-specific tokens

🔬 Technical Details

Stylometric Analysis

The model incorporates comprehensive stylometric analysis including:

  • Lexical Features: Sentence length, word length, vocabulary richness
  • Syntactic Patterns: Definite articles, conjunctions, prepositions
  • Author-Specific Vocabulary: TF-IDF based characteristic words
  • Style Classification: Formality, complexity, emotional intensity

Prompt Engineering

  • Format: "اكتب النص التالي بأسلوب <author:يوسف_إدريس>: [original_text]"
  • Author Tokens: Descriptive tokens like <author:يوسف_إدريس>
  • Target: Generated text in author's style

📚 Supported Authors

📁 Input File Format

For batch processing, your input file should have the following format:

📊 Example Snippets from the Dataset

id text_in_msa (partial) text_in_author_style (partial)
3835 "لم أقم مطلقًا بالاحتفال بعيد ميلادي... وكنت أتجادل مع كامل الشناوي..." "عمري ما احتفلت بعيد ميلادي... وأتشاجر مع كامل الشناوي على ذلك الاكتئاب..."
3836 "الزمن العام هو العداد الجماعي الذي يسجل السنين... ويبرز الزمن الخاص..." "الزمن العام يعدّ السنين للناس كلها... أما عدادك الخاص فأنت نادرًا ما تنظر فيه..."
3837 "مصر الغنية الراقية... اشتراكية وديمقراطية تتفاعل معًا... أحلام الخمسين..." "مصر المصنِّعة... الكون مائة زهرة... وحين أبلغ الخمسين أبدأ أعيش وأتعلم الموسيقى..."
3838 "غرابة التجربة... طفولة جادة تمامًا بلا مرح... الطفولة كانت عيبًا..." "غريبة هي الأفكار... كنتُ رجلًا رهيبًا في ثوب طفل... والطفولة تُهمة نخشى الاعتراف بها..."
3839 "هذا ليس ندمًا... موجة تفوقك قوة... النصر الحقيقي أن تعيش كما تختار..." "ليس مرارة ولا ندمًا... أنت تناضل موجة أعتى منك... والحق أن تحيا كما اخترت أنت..."

📊 Performance Metrics

  • BLEU Score: 24.58
  • chrF Score: 59.01
  • Competition: First Place in AraGenEval 2024
  • Supported Authors: 21 Arabic authors

Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first.

🚀 Quick Start: Style Transfer Example

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

# Load model
model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21"

tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Input text and author
text = "لم أقم مطلقًا بالاحتفال بعيد ميلادي منذ طفولتي."
author = "يوسف إدريس"

# Prompt format
prompt = f"اكتب النص التالي بأسلوب <author:{author.replace(' ', '_')}>: {text}"

# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(device)

# Generate
output_ids = model.generate(
    **inputs,
    max_length=256,
    num_beams=5,
    early_stopping=True
)

# Decode
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Original:", text)
print("Author:", author)
print("Output:", generated_text)

🎯 Use Cases

  • Content Creation: Generate text in specific author styles
  • Educational Tools: Demonstrate different writing styles
  • Research: Study Arabic literary styles and patterns
  • Creative Writing: Inspire new content in classic styles

🤝 Contributing

This model was developed for the AraGenEval 2025 competition. For questions or contributions, please refer to the competition guidelines.

📄 License

This model is released under the same license as the base AraT5v2 model.

BibTeX Citation

@inproceedings{nacar2025anlpers,
  title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer},
  author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii},
  booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
  pages={49--53},
  year={2025}
}

🏆 First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition

Downloads last month
12
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Omartificial-Intelligence-Space/AraStyleTransfer-21

Finetuned
(22)
this model