QtGroup
/

CodeLlama-13B-QML

@@ -10,16 +10,25 @@ tags:
 # Model Overview
 ## Description:
-CodeLlama-13B-QML is a large language model customized by the Qt Company for Fill-In-The-Middle code completion tasks in the QML programming language, especially for Qt Quick Controls compliant with Qt 6 releases. The CodeLlama-13B-QML model is designed for companies and individuals who want to self-host their LLM for HMI (Human Machine Interface) software development instead of relying on third-party hosted LLMs.
-This model reaches a score of 86% on the QML100 Fill-In-the-Middle code completion benchmark for Qt 6-compliant code. In comparison, CodeLlama-7B-QML (finetuned model from Qt) scored 79%, Claude 3.7 Sonnet scored 76%, Claude 3.5 Sonnet scored 68%, the base CodeLlama-13B scored 66%, GPT-4o scored 62%, and CodeLlama-7B scored 61%. This model was fine-tuned based on raw data from over 5000 human-created QML code snippets using the LoRa fine-tuning method. CodeLlama-13B-QML is not optimised for the creation of Qt5-release compliant, C++, or Python code.
  ## Terms of use:
 By accessing this model, you are agreeing to the Llama 2 terms and conditions of the [license](https://github.com/meta-llama/llama/blob/main/LICENSE), [acceptable use policy](https://github.com/meta-llama/llama/blob/main/USE_POLICY.md) and [Meta’s privacy policy](https://www.facebook.com/privacy/policy/). By using this model, you are furthermore agreeing to the [Qt AI Model terms & conditions](https://www.qt.io/terms-conditions/ai-services/model-use).
  ## Usage:
-CodeLlama-13B-QML is a medium-sized Language Model that requires significant computing resources to perform with inference (response) times suitable for automatic code completion. Therefore, it should be used with a GPU accelerator, either in a cloud environment such as AWS, Google Cloud, Microsoft Azure, or locally.
 Large Language Models, including CodeLlama-13B-QML, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building AI systems.
@@ -29,44 +38,37 @@ The repository contains multiple files with adapters.
 The configuration depends on the chosen cloud technology.
-Running a CodeLlama-13b-QML in the cloud requires working with Docker and vLLM for optimal performance. Make sure all required dependencies are installed (transformers, accelerate and peft modules). Use bfloat16 precision. The setup leverages the base model from Hugging Face (requiring an access token) combined with adapter weights from the repository. Using vLLM enables efficient inference with an OpenAI-compatible API endpoint, making integration straightforward. vLLM serves as a highly optimized backend that implements request batching and queuing mechanisms, providing excellent serving optimization. The docker container should be run on an instance with GPU accelerator. The configuration has been thoroughly tested on Ubuntu 22.04 LTS running NVIDIA driver with A100 80GB GPUs, demonstrating stable and efficient performance.
 ## How to run CodeLlama-13B-QML in ollama:
-#### 1. Install ollama
-https://ollama.com/download
-#### 2. Clone the model repository
-#### 3. Open the terminal and go to the repository
-#### 4. Build the model in ollama
 ```
-ollama create codellama:13b-code-qml -f Modelfile
 ```
-The model's name must be exactly as above if one wants to use the model in the Qt Creator
-#### 5. Run the model
 ```
-ollama run codellama:13b-code-qml
 ```
-You can start writing prompts in the terminal or send curl requests now.
-Here is a curl request example:
 ```
 curl -X POST http://localhost:11434/api/generate -d '{
-  "model": "codellama:13b-code-qml",
   "Prompt": "<SUF>\n    title: qsTr(\"Hello World\")\n}<PRE>import QtQuick\n\nWindow {\n    width: 640\n    height: 480\n    visible: true\n<MID>",
   "stream": false,
   "temperature": 0,
   "top_p": 0.9,
   "repeat_penalty": 1.1,
-  "num_predict": 300,
   "stop": ["<SUF>", "<PRE>", "</PRE>", "</SUF>", "< EOT >", "\\end", "<MID>", "</MID>", "##"]
 }'
 ```
-The prompt format:
 ```
 "<SUF>{suffix}<PRE>{prefix}<MID>"
 ```
@@ -76,6 +78,9 @@ If there is no suffix, please use:
 "<PRE>{prefix}<MID>"
 ```
 ## Model Version:
 v2.0

 # Model Overview
 ## Description:
+CodeLlama-13B-QML is a large language model customized by the Qt Company for Fill-In-The-Middle code completion tasks in the QML programming language, especially for Qt Quick Controls compliant with Qt 6 releases. The CodeLlama-13B-QML model is designed for companies and individuals that want to self-host their LLM for HMI (Human Machine Interface) software development instead of relying on third-party hosted LLMs. It can by done via cloud services or locally, via Ollama.
+This model reaches a score of 79% on the QML100 Fill-In-the-Middle code completion benchmark for Qt 6-compliant code. In comparison, other models scored:
+CodeLlama-7B-QML: 79%
+Claude 3.7 Sonnet: 76%
+Claude 3.5 Sonnet: 68%
+CodeLlama 13B: 66%
+GPT-4o: 62%
+CodeLlama 7B: 61%
+This model was fine-tuned based on raw data from over 5000 human-created QML code snippets using the LoRa fine-tuning method. CodeLlama-13B-QML is not optimised for the creation of Qt5-release compliant, C++, or Python code.
  ## Terms of use:
 By accessing this model, you are agreeing to the Llama 2 terms and conditions of the [license](https://github.com/meta-llama/llama/blob/main/LICENSE), [acceptable use policy](https://github.com/meta-llama/llama/blob/main/USE_POLICY.md) and [Meta’s privacy policy](https://www.facebook.com/privacy/policy/). By using this model, you are furthermore agreeing to the [Qt AI Model terms & conditions](https://www.qt.io/terms-conditions/ai-services/model-use).
  ## Usage:
+CodeLlama-13B-QML is a medium-sized Language Model that requires significant computing resources to perform with inference (response) times suitable for automatic code completion. Therefore, it should be used with a GPU accelerator, either in the cloud environment such as AWS, Google Cloud, Microsoft Azure, or locally.
 Large Language Models, including CodeLlama-13B-QML, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building AI systems.
 The configuration depends on the chosen cloud technology.
+Running a CodeLlama-13B-QML in the cloud requires working with Docker and vLLM for optimal performance. Make sure all required dependencies are installed (transformers, accelerate and peft modules). Use bfloat16 precision. The setup leverages the base model from Hugging Face (requiring an access token) combined with adapter weights from the repository. Using vLLM enables efficient inference with an OpenAI-compatible API endpoint, making integration straightforward. vLLM serves as a highly optimized backend that implements request batching and queuing mechanisms, providing excellent serving optimization. The docker container should be run on an instance with GPU accelerator. The configuration has been thoroughly tested on Ubuntu 22.04 LTS running NVIDIA driver with A100 80GB GPUs, demonstrating stable and efficient performance.
 ## How to run CodeLlama-13B-QML in ollama:
+We have preloaded the model to Ollama for your convenience.
+#### 1. Download and install Ollama from Ollama's web page (if you are not using it yet):
 ```
+https://ollama.com/download
 ```
+#### 2. Run the model with the following command in Ollama's CLI:
 ```
+ollama run theqtcompany/codellama-13b-qml
 ```
+Now, you can set and use CodeLlama-13B-QML as an LLM for code completions in the Qt AI Assistant or other coding assistants. If you want to test the model in Ollama, then you can write curl requests in Ollama's CLI, as shown below.
 ```
 curl -X POST http://localhost:11434/api/generate -d '{
+  "model": "theqtcompany/codellama-13b-qml",
   "Prompt": "<SUF>\n    title: qsTr(\"Hello World\")\n}<PRE>import QtQuick\n\nWindow {\n    width: 640\n    height: 480\n    visible: true\n<MID>",
   "stream": false,
   "temperature": 0,
   "top_p": 0.9,
   "repeat_penalty": 1.1,
+  "num_predict": 500,
   "stop": ["<SUF>", "<PRE>", "</PRE>", "</SUF>", "< EOT >", "\\end", "<MID>", "</MID>", "##"]
 }'
 ```
+In general, the prompt format for CodeLlama-13B-QML is:
 ```
 "<SUF>{suffix}<PRE>{prefix}<MID>"
 ```
 "<PRE>{prefix}<MID>"
 ```
+## Modify and Adapt CodeLlama-13B-QML:
+The HuggingFace repository contains all necessary components including the .safetensors files and tokenizer configurations, giving you everything needed to modify the model across various environments and better suit your specific requirements or train it on your custom dataset.
 ## Model Version:
 v2.0