File size: 3,148 Bytes

240ceda
 
cf945b2
 
 
 
 
 
 
 
240ceda
 
cf945b2
240ceda
fc334fc
240ceda
 
 
cf945b2
240ceda
5e7e89c
240ceda
 
 
cf945b2
240ceda
cf945b2
240ceda
cf945b2
d3ecb83
 
 
 
 
 
 
 
 
 
 
 
 
 
240ceda
d3ecb83
cf945b2
 
240ceda
cf945b2
 
 
 
240ceda
cf945b2
 
240ceda
cf945b2
 
 
 
240ceda
cf945b2
 
240ceda
 
 
cf945b2
240ceda
 
 
cf945b2
 
 
240ceda
cf945b2
240ceda
cf945b2
 
 
 
 
 
 
 
 
 
 
 
 
240ceda
cf945b2
240ceda
cf945b2
 
 
 
 
240ceda
 
 
 
cf945b2

---
library_name: transformers
license: mit
datasets:
- jhu-clsp/jfleg
language:
- en
base_model:
- google-t5/t5-base
pipeline_tag: text2text-generation
---

# 📚 Model Card for Grammar Correction Model

This is a grammar correction model based on the Google T5 architecture, fine-tuned on the JHU-CLSP/JFLEG dataset for text correction tasks. ✍️

## Model Details

This model is designed to correct grammatical errors in English sentences. It was fine-tuned using the JFLEG dataset, which provides examples of grammatically correct sentences.

- **Follow the Developer:** Abdul Samad Siddiqui ([@samadpls](https://github.com/samadpls)) 👨‍💻

## Uses

This model can be directly used to correct grammar and spelling mistakes in sentences. ✅

### Example Usage

Here's a basic code snippet to demonstrate how to use the model:
```python
import requests

API_URL = "https://api-inference.huggingface.co/models/samadpls/t5-base-grammar-checker"
HEADERS = {"Authorization": "Bearer YOUR_HF_API_KEY"}

def query(payload):
    response = requests.post(API_URL, headers=HEADERS, json=payload)
    return response.json()

data = query({"inputs": "grammar: This sentences, has bads grammar and spelling!"})
print(data)

```

OR
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the model and tokenizer
model_name = "samadpls/t5-base-grammar-checker"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Example input
example_1 = "grammar: This sentences, has bads grammar and spelling!"

# Tokenize and generate corrected output
inputs = tokenizer.encode(example_1, return_tensors="pt")
outputs = model.generate(inputs)
corrected_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Corrected Sentence:", corrected_sentence)
```

## Training Details

The model was trained on the JHU CLSP JFLEG dataset, which includes various examples of sentences with grammatical errors and their corrections. 📖

### Training Procedure

- **Training Hardware:** Personal laptop with NVIDIA GeForce MX230 GDDR5 and 16GB RAM 💻
- **Training Time:** Approximately 1 hour ⏳
- **Hyperparameters:** No specific hyperparameters were set for training.

### Training Logs

| Step | Training Loss | Validation Loss |
|------|---------------|-----------------|
| 1    | 0.9282        | 0.6091          |
| 2    | 0.6182        | 0.5561          |
| 3    | 0.6279        | 0.5345          |
| 4    | 0.6345        | 0.5147          |
| 5    | 0.5636        | 0.5076          |
| 6    | 0.6009        | 0.4928          |
| 7    | 0.5469        | 0.4950          |
| 8    | 0.5797        | 0.4834          |
| 9    | 0.5619        | 0.4818          |
| 10   | 0.6342        | 0.4788          |
| 11   | 0.5481        | 0.4786          |

### Final Training Metrics

- **Training Runtime:** 1508.2528 seconds ⏱️
- **Training Samples per Second:** 1.799
- **Training Steps per Second:** 0.225
- **Final Training Loss:** 0.5925
- **Final Epoch:** 1.0


## Model Card Contact

For inquiries, please contact Abdul Samad Siddiqui via GitHub. 📬