VAR-d30-GRPO-Aesthetic

VAR-d16-GRPO-Aesthetic builds on the VAR-d30 model (2B parameters), which was pre-trained on ImageNet, and fine-tuned using Group Relative Policy Optimization (GRPO) to improve image aesthetics. The aesthetic reward signal was derived from LAION's aesthetic predictor V2, which combines CLIP embeddings with a multi-layer perceptron and was trained on 176,000 images rated by users on an aesthetic scale from 1 to 10.

This model is from the paper "Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization" by Matteo Gallici and Haitz Sáez de Ocáriz Borde, presented at 2nd Workshop on Models of Human Feedback for AI Alignment (ICML 2025).

@inproceedings{
  gallici2025finetuning,
  title={Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization},
  author={Matteo Gallici and Haitz S{\'a}ez de Oc{\'a}riz Borde},
  booktitle={2nd Workshop on Models of Human Feedback for AI Alignment},
  year={2025},
  url={https://openreview.net/forum?id=lZK2svoMne}
}