aliangdw/rfm_v1
This is a Reward Function Model (RFM) for vision-language preference learning and similarity assessment.
Model Details
- Base Model: Qwen/Qwen2.5-VL-3B-Instruct
- Model Type: qwen2_5_vl
- Architecture: RFMModel
- Task: Vision-Language Reward Modeling
- Training Method: FSDP (Fully Sharded Data Parallel)
Usage
from transformers import AutoProcessor, AutoModel
import torch
# Load model and processor
processor = AutoProcessor.from_pretrained("aliangdw/rfm_v1", trust_remote_code=True)
model = AutoModel.from_pretrained("aliangdw/rfm_v1", trust_remote_code=True)
# Example usage for preference scoring
# inputs = processor(images=images, text=text, return_tensors="pt")
# outputs = model(**inputs, sample_type="preference")
Model Capabilities
This RFM model can perform:
- Preference Prediction: Given two trajectories A and B, predict which one is preferred
- Similarity Assessment: Evaluate how similar a trajectory is to a reference
- Progress Estimation: Estimate task completion progress
Training
The model was trained using:
- FSDP for distributed training
- Mixed precision (bfloat16)
- Custom loss functions for preference and similarity learning
Files
This repository contains:
- Model weights in SafeTensors format
- Configuration files
- Tokenizer/Processor files
Citation
If you use this model, please cite:
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for aliangdw/rfm_v1
Base model
Qwen/Qwen2.5-VL-3B-Instruct