Veri-Code
/

ReForm-SFT-14B

+---
+license: mit
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- code-generation
+- reinforcement-learning
+- formal-methods
+- dafny
+---
+# Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny
+This repository hosts models and resources from the **Re:Form** project, as detailed in the paper:
+- 📄 [**Paper: Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny**](https://huggingface.co/papers/2507.16331)
+- 🌐 [Project Page](https://veri-code.github.io/ReForm-page)
+- 💻 [GitHub Repository](https://github.com/Veri-Code/ReForm)
+Re:Form introduces a novel approach to formal software verification using Large Language Models (LLMs) by grounding them in rigorous formal systems like Dafny. This enables the automatic and mathematically provable verification of their reasoning processes and outcomes, addressing the challenges of reliability and scalability faced by existing informal language-based LLMs. The project systematically explores ways to reduce reliance on human priors by integrating an automatic and scalable data curation pipeline with careful Reinforcement Learning (RL) designs that leverage feedback from formal language verifiers.
+<div align="center">
+  <img src="https://github.com/Veri-Code/ReForm/raw/main/assets/pipeline.png" width="70%" alt="Overall Pipeline"/>
+</div>
+## Key Highlights
+*   **Formal Language Grounding**: Re:Form grounds LLMs in formal languages like Dafny, enabling automatic and mathematically provable verification of their reasoning and outcomes.
+*   **Reduced Human Priors**: It systematically minimizes the need for extensive human-annotated chain-of-thought and other priors for complex programming tasks.
+*   **Automated Data Curation**: Introduces an automatic and scalable pipeline for curating training data.
+*   **RL Integration**: Incorporates careful Reinforcement Learning designs integrated with feedback from formal language verifiers.
+*   **DafnyComp Benchmark**: Introduces `DafnyComp`, a new benchmark of compositional formal programs with auto-formalized specifications.
+*   **Superior Performance**: Even small Re:Form models (e.g., 0.5B) demonstrate the ability to generate syntactically valid and verifiable Dafny code, surpassing proprietary models. RL further enhances performance, achieving stronger generalization to out-of-domain tasks and outperforming strong baselines on DafnyComp.
+## Model Checkpoints
+The project provides Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) fine-tuned model checkpoints. You can find them under the [Veri-Code organization](https://huggingface.co/Veri-Code) on Hugging Face.
+Available checkpoints:
+*   **SFT Models**: `sft_0.5B`, `sft_1.5B`, `sft_3B`, `sft_7B`, `sft_14B`
+*   **RL Fine-tuned Model**: `14B-RL-entropy`
+## Datasets
+The project utilizes the following datasets:
+*   **Python2Dafny**: 18k examples
+*   **DafnyComp**: 300 examples
+These datasets are in JSON format and can be transformed into parquet files for training using the `src.data_preprocess` script available in the GitHub repository.
+## How to Use
+You can easily load and use the Re:Form models with the Hugging Face `transformers` library. This example demonstrates how to generate Dafny code based on a prompt.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load the model and tokenizer (replace with your desired checkpoint, e.g., "Veri-Code/sft_14B")
+model_name = "Veri-Code/sft_1.5B"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16, # Use torch.float16 if bfloat16 is not supported by your hardware
+    device_map="auto",
+    trust_remote_code=True # Required for models with custom code or architecture extensions
+)
+# Example prompt for Dafny code generation (Qwen2 specific chat format)
+# The model is trained to generate Dafny code given a context or problem statement.
+prompt = """<|im_start|>user
+Here is a simple sorting algorithm in Dafny:
+```dafny
+method InsertionSort<T>(a: array<T>) modifies a
+  requires forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
+  ensures forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
+{
+```<|im_end|>
+<|im_start|>assistant
+```dafny"""
+# Apply chat template and tokenize
+# This model uses a chat template similar to Qwen2, ensure the prompt matches training format.
+model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
+# Generate Dafny code
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=256, # Adjust as needed for expected code length
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.9,
+    eos_token_id=tokenizer.eos_token_id, # Ensure generation stops at EOS token
+)
+# Decode and print the generated code
+generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=False)
+print(generated_text)
+# Note: The exact prompting format for optimal Dafny code generation and verification might
+# involve specific tokens or structures as used during the model's training.
+# Refer to the [GitHub repository](https://github.com/Veri-Code/ReForm) for more detailed
+# usage examples, training scripts, and interaction with the Dafny environment.
+```
+## Citation
+If you use this work in your research, please cite:
+```bibtex
+@misc{yan2025reformreducinghuman,
+      title={Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny},
+      author={Chuanhao Yan and Fengdi Che and Xuhan Huang and Xu Xu and Xin Li and Yizhi Li and Xingwei Qu and Jingzhe Shi and Zhuangzhuang He and Chenghua Lin and Yaodong Yang and Binhang Yuan and Hang Zhao and Yu Qiao and Bowen Zhou and Jie Fu},
+      year={2025},
+      eprint={2507.16331},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2507.16331},
+}
+```