yuyuzhang commited on
Commit
d8444c8
·
verified ·
1 Parent(s): 23f7448

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -3
README.md CHANGED
@@ -1,3 +1,74 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - ByteDance-Seed/Seed-Coder-8B-Base
5
+ ---
6
+
7
+ # Seed-Coder-8B-Instruct
8
+
9
+ ## Introduction
10
+ Seed-Coder-8B-Instruct is an 8-billion-parameter model instruction-tuned specifically for code generation, code reasoning, and code understanding. It is built to empower developers with high-quality, efficient code assistance. It features:
11
+ - Trained on a **massively curated corpus**, where **an LLM-based filter** is applied to select **high-quality real-world code**, **text-code alignment data**, and **synthetic datasets** — ensuring cleaner and more useful data compared to traditional heuristic-based curation.
12
+ - Achieves superior performance across **code generation**, **bug fixing**, and **reasoning** tasks, rivaling or surpassing larger open-source code models.
13
+ - **Instruction-tuned** to reliably follow user intents across a diverse range of coding and reasoning prompts.
14
+ - Supports **long-context handling** up to 32K tokens, enabling processing of complex multi-file projects and detailed coding tasks.
15
+
16
+ ## Requirements
17
+ You will need to install the latest versions of `transformers` and `accelerate`:
18
+
19
+ ```bash
20
+ pip install -U transformers accelerate
21
+ ```
22
+
23
+ ## Quickstart
24
+
25
+ Here is a simple example demonstrating how to load the model and generate code using the Hugging Face `pipeline` API:
26
+
27
+ ```python
28
+ import transformers
29
+ import torch
30
+
31
+ model_id = "ByteDance-Seed/Seed-Coder-8B-Instruct"
32
+
33
+ pipeline = transformers.pipeline(
34
+ "text-generation",
35
+ model=model_id,
36
+ model_kwargs={"torch_dtype": torch.bfloat16},
37
+ device_map="auto",
38
+ )
39
+
40
+ messages = [
41
+ {"role": "user", "content": "Write a quick sort algorithm."},
42
+ ]
43
+
44
+ outputs = pipeline(
45
+ messages,
46
+ max_new_tokens=512,
47
+ )
48
+ print(outputs[0]["generated_text"][-1]["content"])
49
+ ```
50
+
51
+ ## Evaluation
52
+
53
+ Seed-Coder-8B-Instruct demonstrates strong performance across a variety of coding benchmarks, showing:
54
+ - Competitive or superior results compared to similarly sized open-source code models.
55
+ - Robustness across different programming languages and domains.
56
+ - Ability to understand, reason, and repair complex code snippets.
57
+
58
+ For detailed results, please check our [📑 paper](https://arxiv.org/pdf/xxx.xxxxx).
59
+
60
+ ## Citation
61
+
62
+ If you find our work helpful, feel free to give us a cite.
63
+
64
+ ```
65
+ @article{zhang2025seedcoder,
66
+ title={Seed-Coder: Let the Code Model Curate Data for Itself},
67
+ author={Xxx},
68
+ year={2025},
69
+ eprint={2504.xxxxx},
70
+ archivePrefix={arXiv},
71
+ primaryClass={cs.CL},
72
+ url={https://arxiv.org/abs/xxxx.xxxxx},
73
+ }
74
+ ```