apple
/

DiffuCoder-7B-Base

text-diffusion-model

diffusion large language model

Model card Files Files and versions

Sansa commited on Jul 4

Commit

bc21947

·

verified ·

1 Parent(s): 77fbe3f

Update README.md

add inference example

Files changed (1) hide show

README.md +48 -1

README.md CHANGED Viewed

@@ -16,12 +16,59 @@ The DiffuCoder-7B-Base model is our foundational masked diffusion LLM for code g
 - Benchmarks: Strong baseline performance on HumanEval, MBPP and BigCodeBench.
 #### More details and usage examples:
 - Paper: [DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation](https://arxiv.org/abs/2506.20639)
 - GitHub: https://github.com/apple/ml-diffucoder
 #### Acknowledgement
 To power this HuggingFace model release, we reuse [Dream](https://huggingface.co/Dream-org/Dream-v0-Base-7B)'s modeling architecture and generation utils.

 - Benchmarks: Strong baseline performance on HumanEval, MBPP and BigCodeBench.
 #### More details and usage examples:
 - Paper: [DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation](https://arxiv.org/abs/2506.20639)
 - GitHub: https://github.com/apple/ml-diffucoder
+```
+import torch
+from transformers import AutoModel, AutoTokenizer
+model_path = "apple/DiffuCoder-7B-Base"
+model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+model = model.to("cuda").eval()
+prompt = """
+from typing import List
+def has_close_elements(numbers: List[float], threshold: float) -> bool:
+    \"\"\"
+    Check if in given list of numbers, are any two numbers closer to each other than given threshold.
+    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
+    False
+    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
+    True
+    \"\"\"
+"""
+TOKEN_PER_STEP = 1 # diffusion timesteps * TOKEN_PER_STEP = total new tokens
+inputs = tokenizer(prompt, return_tensors="pt")
+input_ids = inputs.input_ids.to(device="cuda")
+attention_mask = inputs.attention_mask.to(device="cuda")
+output = model.diffusion_generate(
+    input_ids,
+    attention_mask=attention_mask,
+    max_new_tokens=256,
+    output_history=True,
+    return_dict_in_generate=True,
+    steps=256//TOKEN_PER_STEP,
+    temperature=0.2,
+    top_p=0.95,
+    alg="entropy",
+    alg_temp=0.,
+)
+generations = [
+    tokenizer.decode(g[len(p) :].tolist())
+    for p, g in zip(input_ids, output.sequences)
+]
+print(generations[0].split(tokenizer.eos_token)[0])
+```
 #### Acknowledgement
 To power this HuggingFace model release, we reuse [Dream](https://huggingface.co/Dream-org/Dream-v0-Base-7B)'s modeling architecture and generation utils.