Differentiable Evolutionary Reinforcement Learning
Paper
•
2512.13399
•
Published
•
21
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct on the sft_with_format dataset.
The model is the base Meta-Optimizer for DERL used for all tasks.
The following hyperparameters were used during training:
Base model
Qwen/Qwen2.5-0.5BTotally Free + Zero Barriers + No Login Required