t5-small-updated / README.md
Harshathemonster's picture
Update README.md
3657da6 verified
metadata
library_name: transformers
tags:
  - grammar-correction
  - t5
  - text-to-text
  - english
license: apache-2.0
datasets:
  - chaojiang06/wiki_auto
language:
  - en
base_model:
  - google-t5/t5-small
pipeline_tag: text2text-generation

T5-Small Grammar Correction

A fine-tuned t5-small model for correcting grammar errors in English text. Given a sentence, the model generates a grammatically correct version using a text-to-text approach.

Model Details

  • Developed by: Harsha Vardhan N
  • Model type: Sequence-to-Sequence Transformer
  • Language(s): English
  • License: Apache 2.0
  • Finetuned from model: t5-small

Training Details

Training Data

The model was fine-tuned on the wiki_auto/auto_full_with_split dataset, a large-scale corpus designed for sentence-level grammatical and stylistic simplification. It contains aligned pairs of complex and simplified English sentences extracted from Wikipedia and Simple Wikipedia. For this task, the dataset was used to teach the model how to correct ungrammatical sentences into fluent and grammatically correct English.

Training Procedure

  • Epochs: 3
  • Training Duration: ~1 hour
  • Optimizer: AdamW (via Hugging Face Seq2SeqTrainer)
  • Learning Rate: 5e-5
  • Batch Size: 8
  • Environment: Google Colab GPU

Technical Specifications

Compute Infrastructure

Hardware

  • GPU: Google Colab-provided GPU (likely Tesla T4)

Software

  • Framework: Hugging Face Transformers, PyTorch
  • Trainer Used: Seq2SeqTrainer