Gemma-3 270M Teacher Evaluation Model

Model Description

This model is a fine-tuned version of unsloth/gemma-3-270m-it specifically trained to evaluate student texts like a teacher. It can assess educational texts and provide numerical evaluations on a scale of 0-5, similar to how a teacher would grade student work.

Training Details

Base Model: Gemma-3 270M Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Dataset: Custom educational texts with teacher evaluations
Training Samples: 3,646 training + 912 validation samples
Epochs: 3
Training Time: ~15 minutes
Final Training Loss: 0.3782
Final Validation Loss: 0.4566

Model Performance

The model achieved excellent generalization with minimal overfitting:

Training loss decreased from 0.57 to 0.38 (-33% improvement)
Validation loss remained stable around 0.45-0.46
Small gap between training and validation loss indicates good generalization

Usage

from unsloth import FastModel
from transformers import AutoTokenizer

# Load the model
model, tokenizer = FastModel.from_pretrained("carlosvillu/gemma3-270m-teacher-evaluation")

# Example usage
prompt = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Eres una profesora de primera, tu tarea es la de evaluar los textos que está escribiendo tus alumnas como respuesta a una pregunta que le has formulado.
La evaluación, tiene que ser numérica y solo puedes ponerle una nota de 0 a 5.

<|eot_id|><|start_header_id|>user<|end_header_id|>

Un alumno del curso 2º de educación secundaria de cataluña ha respondido a la consigna `Describe tu ciudad` con:

Mi ciudad es muy bonita. Tiene muchas casas y parques. Me gusta vivir aquí porque hay tiendas y restaurantes.

<|eot_id|><|start_header_id|>assistant<|end_header_id|>'''

# Generate evaluation
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Configuration

LoRA rank: 128
Learning rate: 2e-4
Batch size: 8 (effective)
Scheduler: Cosine with warmup
Weight decay: 0.01
Gradient clipping: 1.0

Evaluation Scale

The model evaluates texts on a scale of 0-5:

0: Substantially below expected quality for educational level
1: Slightly below expected quality
2: Barely sufficient quality
3: Expected quality for educational level
4: Slightly above expected quality
5: Substantially above expected quality

Created by

Fine-tuned by carlosvillu using Unsloth for efficient training.

Training date: 2025-08-17

carlosvillu
/

gemma3-270m-teacher-evaluation