MeowML/ToxicBERT - Turkish Toxic Language Detection

Model Description

ToxicBERT is a fine-tuned BERT model specifically designed for detecting toxic language in Turkish text. Built upon the dbmdz/bert-base-turkish-cased foundation model, this classifier can identify potentially harmful, offensive, or toxic content in Turkish social media posts, comments, and general text.

Model Details

Model Type: Text Classification (Binary)
Language: Turkish (tr)
Base Model: dbmdz/bert-base-turkish-cased
License: MIT
Library: Transformers
Task: Toxicity Detection

Intended Use

Primary Use Cases

Content moderation for Turkish social media platforms
Automated filtering of user-generated content
Research in Turkish NLP and toxicity detection
Educational purposes for understanding toxic language patterns

Out-of-Scope Use

This model should not be used as the sole decision-maker for content moderation without human oversight
Not suitable for languages other than Turkish
Should not be used for sensitive applications without proper validation and testing

Training Data

The model was trained on the Overfit-GM/turkish-toxic-language dataset, which contains Turkish text samples labeled for toxicity. The dataset includes various forms of toxic content commonly found in online Turkish communications.

Model Performance

The model outputs:

Binary Classification: 0 (Non-toxic) or 1 (Toxic)
Confidence Score: Probability score indicating model confidence
Toxic Probability: Specific probability of the text being toxic

Usage

Quick Start

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification

    # Load model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-cased")
    model = AutoModelForSequenceClassification.from_pretrained("MeowML/ToxicBERT")

    # Prepare text
    text = "Merhaba, nasılsın?"
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)

    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        prediction = torch.argmax(probabilities, dim=-1)
        
    toxic_probability = probabilities[0][1].item()
    is_toxic = bool(prediction.item())

    print(f"Is toxic: {is_toxic}")
    print(f"Toxic probability: {toxic_probability:.4f}")

Advanced Usage with Custom Class

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification

    class ToxicLanguageDetector:
        def __init__(self, model_name="MeowML/ToxicBERT"):
            self.tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-cased")
            self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
            self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
            self.model.to(self.device)
            self.model.eval()
            
        def predict(self, text):
            inputs = self.tokenizer(
                text,
                truncation=True,
                padding='max_length',
                max_length=256,
                return_tensors='pt'
            ).to(self.device)
            
            with torch.no_grad():
                outputs = self.model(**inputs)
                probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
                prediction = torch.argmax(probabilities, dim=-1)
            
            return {
                'text': text,
                'is_toxic': bool(prediction.item()),
                'toxic_probability': probabilities[0][1].item(),
                'confidence': max(probabilities[0]).item()
            }

    # Usage
    detector = ToxicLanguageDetector()
    result = detector.predict("Merhaba, nasılsın?")
    print(result)

Limitations and Biases

Limitations

The model's performance depends heavily on the training data quality and coverage
May have difficulty with context-dependent toxicity (sarcasm, irony)
Performance may vary across different Turkish dialects or informal language
Shorter texts might be more challenging to classify accurately

Potential Biases

The model may reflect biases present in the training dataset
Certain topics, demographics, or linguistic patterns might be over- or under-represented
Regular evaluation and bias testing are recommended for production use

Ethical Considerations

This model should be used responsibly with human oversight
False positives and negatives are expected and should be accounted for
Consider the impact on freedom of expression when implementing automated moderation
Regular auditing and updating are recommended to maintain fairness

Technical Specifications

Input: Text strings (max 256 tokens)
Output: Binary classification with probability scores
Model Size: Based on BERT-base architecture
Inference Speed: Optimized for both CPU and GPU inference
Memory Requirements: Suitable for standard hardware configurations

Citation

If you use this model in your research or applications, please cite:

    @misc{meowml_toxicbert_2024,
      title={ToxicBERT: Turkish Toxic Language Detection},
      author={MeowML},
      year={2024},
      publisher={Hugging Face},
      url={https://huggingface.co/MeowML/ToxicBERT}
    }

Acknowledgments

Base model: dbmdz/bert-base-turkish-cased
Training dataset: Overfit-GM/turkish-toxic-language
Built with Hugging Face Transformers library

Contact

For questions, issues, or suggestions, please open an issue in the model repository or contact the MeowML team.

Disclaimer: This model is provided for research and educational purposes. Users are responsible for ensuring appropriate and ethical use in their applications.

MeowML
/

ToxicBERT