SiniShell1 nielsr HF Staff commited on
Commit
6671dc3
·
verified ·
1 Parent(s): 0278193

Add comprehensive model card (#1)

Browse files

- Add comprehensive model card (04ddf57af53cb82540733a98b24bf41df04c11ce)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +120 -3
README.md CHANGED
@@ -1,3 +1,120 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ tags:
6
+ - code-generation
7
+ - reinforcement-learning
8
+ - formal-methods
9
+ - dafny
10
+ ---
11
+
12
+ # Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny
13
+
14
+ This repository hosts models and resources from the **Re:Form** project, as detailed in the paper:
15
+
16
+ - 📄 [**Paper: Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny**](https://huggingface.co/papers/2507.16331)
17
+ - 🌐 [Project Page](https://veri-code.github.io/ReForm-page)
18
+ - 💻 [GitHub Repository](https://github.com/Veri-Code/ReForm)
19
+
20
+ Re:Form introduces a novel approach to formal software verification using Large Language Models (LLMs) by grounding them in rigorous formal systems like Dafny. This enables the automatic and mathematically provable verification of their reasoning processes and outcomes, addressing the challenges of reliability and scalability faced by existing informal language-based LLMs. The project systematically explores ways to reduce reliance on human priors by integrating an automatic and scalable data curation pipeline with careful Reinforcement Learning (RL) designs that leverage feedback from formal language verifiers.
21
+
22
+ <div align="center">
23
+ <img src="https://github.com/Veri-Code/ReForm/raw/main/assets/pipeline.png" width="70%" alt="Overall Pipeline"/>
24
+ </div>
25
+
26
+ ## Key Highlights
27
+
28
+ * **Formal Language Grounding**: Re:Form grounds LLMs in formal languages like Dafny, enabling automatic and mathematically provable verification of their reasoning and outcomes.
29
+ * **Reduced Human Priors**: It systematically minimizes the need for extensive human-annotated chain-of-thought and other priors for complex programming tasks.
30
+ * **Automated Data Curation**: Introduces an automatic and scalable pipeline for curating training data.
31
+ * **RL Integration**: Incorporates careful Reinforcement Learning designs integrated with feedback from formal language verifiers.
32
+ * **DafnyComp Benchmark**: Introduces `DafnyComp`, a new benchmark of compositional formal programs with auto-formalized specifications.
33
+ * **Superior Performance**: Even small Re:Form models (e.g., 0.5B) demonstrate the ability to generate syntactically valid and verifiable Dafny code, surpassing proprietary models. RL further enhances performance, achieving stronger generalization to out-of-domain tasks and outperforming strong baselines on DafnyComp.
34
+
35
+ ## Model Checkpoints
36
+
37
+ The project provides Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) fine-tuned model checkpoints. You can find them under the [Veri-Code organization](https://huggingface.co/Veri-Code) on Hugging Face.
38
+
39
+ Available checkpoints:
40
+ * **SFT Models**: `sft_0.5B`, `sft_1.5B`, `sft_3B`, `sft_7B`, `sft_14B`
41
+ * **RL Fine-tuned Model**: `14B-RL-entropy`
42
+
43
+ ## Datasets
44
+
45
+ The project utilizes the following datasets:
46
+ * **Python2Dafny**: 18k examples
47
+ * **DafnyComp**: 300 examples
48
+
49
+ These datasets are in JSON format and can be transformed into parquet files for training using the `src.data_preprocess` script available in the GitHub repository.
50
+
51
+ ## How to Use
52
+
53
+ You can easily load and use the Re:Form models with the Hugging Face `transformers` library. This example demonstrates how to generate Dafny code based on a prompt.
54
+
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForCausalLM
57
+ import torch
58
+
59
+ # Load the model and tokenizer (replace with your desired checkpoint, e.g., "Veri-Code/sft_14B")
60
+ model_name = "Veri-Code/sft_1.5B"
61
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
62
+ model = AutoModelForCausalLM.from_pretrained(
63
+ model_name,
64
+ torch_dtype=torch.bfloat16, # Use torch.float16 if bfloat16 is not supported by your hardware
65
+ device_map="auto",
66
+ trust_remote_code=True # Required for models with custom code or architecture extensions
67
+ )
68
+
69
+ # Example prompt for Dafny code generation (Qwen2 specific chat format)
70
+ # The model is trained to generate Dafny code given a context or problem statement.
71
+ prompt = """<|im_start|>user
72
+ Here is a simple sorting algorithm in Dafny:
73
+ ```dafny
74
+ method InsertionSort<T>(a: array<T>) modifies a
75
+ requires forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
76
+ ensures forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
77
+ {
78
+ ```<|im_end|>
79
+ <|im_start|>assistant
80
+ ```dafny"""
81
+
82
+ # Apply chat template and tokenize
83
+ # This model uses a chat template similar to Qwen2, ensure the prompt matches training format.
84
+ model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
85
+
86
+ # Generate Dafny code
87
+ generated_ids = model.generate(
88
+ model_inputs.input_ids,
89
+ max_new_tokens=256, # Adjust as needed for expected code length
90
+ do_sample=True,
91
+ temperature=0.7,
92
+ top_p=0.9,
93
+ eos_token_id=tokenizer.eos_token_id, # Ensure generation stops at EOS token
94
+ )
95
+
96
+ # Decode and print the generated code
97
+ generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=False)
98
+ print(generated_text)
99
+
100
+ # Note: The exact prompting format for optimal Dafny code generation and verification might
101
+ # involve specific tokens or structures as used during the model's training.
102
+ # Refer to the [GitHub repository](https://github.com/Veri-Code/ReForm) for more detailed
103
+ # usage examples, training scripts, and interaction with the Dafny environment.
104
+ ```
105
+
106
+ ## Citation
107
+
108
+ If you use this work in your research, please cite:
109
+
110
+ ```bibtex
111
+ @misc{yan2025reformreducinghuman,
112
+ title={Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny},
113
+ author={Chuanhao Yan and Fengdi Che and Xuhan Huang and Xu Xu and Xin Li and Yizhi Li and Xingwei Qu and Jingzhe Shi and Zhuangzhuang He and Chenghua Lin and Yaodong Yang and Binhang Yuan and Hang Zhao and Yu Qiao and Bowen Zhou and Jie Fu},
114
+ year={2025},
115
+ eprint={2507.16331},
116
+ archivePrefix={arXiv},
117
+ primaryClass={cs.CL},
118
+ url={https://arxiv.org/abs/2507.16331},
119
+ }
120
+ ```