Upload llama_cpp_python-0.3.8+cu128.gemma3-cp311-cp311-win_amd64.whl

# llama-cpp-python Prebuilt Wheel (Windows x64, CUDA 12.8, Gemma 3 Support)

---
🛠️ **Built with** [llama.cpp (b5192)](https://github.com/ggml-org/llama.cpp) + [CUDA 12.8](https://developer.nvidia.com/cuda-toolkit)
---
**Prebuilt `.whl` for llama-cpp-python 0.3.8 — CUDA 12.8 acceleration with full Gemma 3 model support (Windows x64).**

This repository provides a prebuilt Python wheel (`.whl`) file for **llama-cpp-python**, specifically compiled for Windows 10/11 (x64) with NVIDIA CUDA 12.8 acceleration enabled.

Building `llama-cpp-python` with CUDA support on Windows can be a complex process involving specific Visual Studio configurations, CUDA Toolkit setup, and environment variables. This prebuilt wheel aims to simplify installation for users with compatible systems.

This build is based on **llama-cpp-python** version `0.3.8` of the Python bindings, and the underlying **llama.cpp** source code as of **April 26, 2025**. It has been verified to work with **Gemma 3 models**, correctly offloading layers to the GPU.

Files changed (2) hide show

.gitattributes +1 -0
llama_cpp_python-0.3.8+cu128.gemma3-cp311-cp311-win_amd64.whl +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+llama_cpp_python-0.3.8+cu128.gemma3-cp311-cp311-win_amd64.whl filter=lfs diff=lfs merge=lfs -text

llama_cpp_python-0.3.8+cu128.gemma3-cp311-cp311-win_amd64.whl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:629e3152604b367e7e7fc51d055d821009aca02b9f8f4d65da678f5518915003
+size 62501627