Update README.md
Browse files
README.md
CHANGED
|
@@ -8,4 +8,35 @@ pipeline_tag: text-generation
|
|
| 8 |
tags:
|
| 9 |
- code
|
| 10 |
- CUDA
|
| 11 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
tags:
|
| 9 |
- code
|
| 10 |
- CUDA
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## CudaLLM: A Language Model for High-Performance CUDA Kernel Generation
|
| 14 |
+
|
| 15 |
+
### Model Description
|
| 16 |
+
cudaLLM-8B is a language model for generating high-performance and syntactically correct CUDA kernels. It is based on the Qwen3-8B model and has undergone a two-stage training process to master the complexities of parallel programming for GPUs.
|
| 17 |
+
|
| 18 |
+
**Performance on KernelBench:**
|
| 19 |
+
| | Bo1 | Bo2 | Bo4 | Bo8 | Bo16 |
|
| 20 |
+
|---------|-------|-----|-----|-----|------|
|
| 21 |
+
| Level-1 | 79.75 | 83 | 84 | 86 | 87 |
|
| 22 |
+
| Level-2 | 67.30 | 70 | 71 | 72 | 73 |
|
| 23 |
+
| Level-3 | 20.83 | 26 | 30 | 34 | 36 |
|
| 24 |
+
|
| 25 |
+
### Training Procedure
|
| 26 |
+
The model was trained using the verl library. The model was trained and evaluated on:
|
| 27 |
+
- SFT Dataset: A high-quality dataset of CUDA problem-solution pairs ([sft_cuda_llm_r1.parquet](https://huggingface.co/datasets/ByteDance-Seed/cudaLLM-data)), originally generated by DeepSeek R1, DeepSeel Coder-7B, and Qwen2-32B.
|
| 28 |
+
- RL Dataset: A refined dataset ([rl_cuda_llm_0424.parquet](https://huggingface.co/datasets/ByteDance-Seed/cudaLLM-data)) used to provide performance-based rewards during the RL stage.
|
| 29 |
+
- Evaluation Dataset: The model's performance was benchmarked against the KernelBench dataset.
|
| 30 |
+
|
| 31 |
+
### Intended Use and Limitations
|
| 32 |
+
#### Intended Use
|
| 33 |
+
The primary use of CudaLLM is to assist developers in writing and optimizing high-performance CUDA kernels. It can be used for:
|
| 34 |
+
- Accelerating scientific computing and machine learning workloads.
|
| 35 |
+
- As a co-pilot or productivity tool for HPC and CUDA developers.
|
| 36 |
+
- Research into AI-driven code generation and optimization.
|
| 37 |
+
|
| 38 |
+
#### Limitations and Bias
|
| 39 |
+
- Correctness is Not Guaranteed: While trained to produce correct code, the model's output should always be rigorously tested and verified before deployment in production systems.
|
| 40 |
+
- Security Risks: The generated code is not guaranteed to be secure. Never run model-generated code from an untrusted source without careful inspection.
|
| 41 |
+
- Performance Variability: Kernel performance can vary significantly depending on the target GPU architecture, input data sizes, and compiler version. The generated code may require further manual tuning.
|
| 42 |
+
- Specialized Domain: This model is highly specialized for CUDA code generation. Its performance on general-purpose programming tasks or natural language conversation will be limited.
|