MeowML/ToxicBERT - Turkish Toxic Language Detection

Model Description

ToxicBERT is a fine-tuned BERT model specifically designed for detecting toxic language in Turkish text. Built upon the dbmdz/bert-base-turkish-cased foundation model, this classifier can identify potentially harmful, offensive, or toxic content in Turkish social media posts, comments, and general text.

Model Details

  • Model Type: Text Classification (Binary)
  • Language: Turkish (tr)
  • Base Model: dbmdz/bert-base-turkish-cased
  • License: MIT
  • Library: Transformers
  • Task: Toxicity Detection

Intended Use

Primary Use Cases

  • Content moderation for Turkish social media platforms
  • Automated filtering of user-generated content
  • Research in Turkish NLP and toxicity detection
  • Educational purposes for understanding toxic language patterns

Out-of-Scope Use

  • This model should not be used as the sole decision-maker for content moderation without human oversight
  • Not suitable for languages other than Turkish
  • Should not be used for sensitive applications without proper validation and testing

Training Data

The model was trained on the Overfit-GM/turkish-toxic-language dataset, which contains Turkish text samples labeled for toxicity. The dataset includes various forms of toxic content commonly found in online Turkish communications.

Model Performance

The model outputs:

  • Binary Classification: 0 (Non-toxic) or 1 (Toxic)
  • Confidence Score: Probability score indicating model confidence
  • Toxic Probability: Specific probability of the text being toxic

Usage

Quick Start

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification

    # Load model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-cased")
    model = AutoModelForSequenceClassification.from_pretrained("MeowML/ToxicBERT")

    # Prepare text
    text = "Merhaba, nasılsın?"
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)

    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        prediction = torch.argmax(probabilities, dim=-1)
        
    toxic_probability = probabilities[0][1].item()
    is_toxic = bool(prediction.item())

    print(f"Is toxic: {is_toxic}")
    print(f"Toxic probability: {toxic_probability:.4f}")

Advanced Usage with Custom Class

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification

    class ToxicLanguageDetector:
        def __init__(self, model_name="MeowML/ToxicBERT"):
            self.tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-cased")
            self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
            self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
            self.model.to(self.device)
            self.model.eval()
            
        def predict(self, text):
            inputs = self.tokenizer(
                text,
                truncation=True,
                padding='max_length',
                max_length=256,
                return_tensors='pt'
            ).to(self.device)
            
            with torch.no_grad():
                outputs = self.model(**inputs)
                probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
                prediction = torch.argmax(probabilities, dim=-1)
            
            return {
                'text': text,
                'is_toxic': bool(prediction.item()),
                'toxic_probability': probabilities[0][1].item(),
                'confidence': max(probabilities[0]).item()
            }

    # Usage
    detector = ToxicLanguageDetector()
    result = detector.predict("Merhaba, nasılsın?")
    print(result)

Limitations and Biases

Limitations

  • The model's performance depends heavily on the training data quality and coverage
  • May have difficulty with context-dependent toxicity (sarcasm, irony)
  • Performance may vary across different Turkish dialects or informal language
  • Shorter texts might be more challenging to classify accurately

Potential Biases

  • The model may reflect biases present in the training dataset
  • Certain topics, demographics, or linguistic patterns might be over- or under-represented
  • Regular evaluation and bias testing are recommended for production use

Ethical Considerations

  • This model should be used responsibly with human oversight
  • False positives and negatives are expected and should be accounted for
  • Consider the impact on freedom of expression when implementing automated moderation
  • Regular auditing and updating are recommended to maintain fairness

Technical Specifications

  • Input: Text strings (max 256 tokens)
  • Output: Binary classification with probability scores
  • Model Size: Based on BERT-base architecture
  • Inference Speed: Optimized for both CPU and GPU inference
  • Memory Requirements: Suitable for standard hardware configurations

Citation

If you use this model in your research or applications, please cite:

    @misc{meowml_toxicbert_2024,
      title={ToxicBERT: Turkish Toxic Language Detection},
      author={MeowML},
      year={2024},
      publisher={Hugging Face},
      url={https://huggingface.co/MeowML/ToxicBERT}
    }

Acknowledgments

  • Base model: dbmdz/bert-base-turkish-cased
  • Training dataset: Overfit-GM/turkish-toxic-language
  • Built with Hugging Face Transformers library

Contact

For questions, issues, or suggestions, please open an issue in the model repository or contact the MeowML team.


Disclaimer: This model is provided for research and educational purposes. Users are responsible for ensuring appropriate and ethical use in their applications.

Downloads last month
11
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MeowML/ToxicBERT

Finetuned
(180)
this model

Dataset used to train MeowML/ToxicBERT