my-kai-model / Docs.md
aferrmt's picture
SHA-529; upload documentation to Github
bfc2180
|
raw
history blame
6.16 kB
# AI Chatbot System Technical Documentation
---
## 1. Executive Summary
This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.
---
## 2. System Capabilities
- **Natural Language Understanding**: Implements advanced parsing to interpret user intents and entities.
- **Policy Enforcement**: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
- **Low-Latency Responses**: Achieves sub-second turnaround via event-based orchestration.
- **Modular Extensibility**: Supports pluggable integrations with external APIs, databases, and analytics pipelines.
---
## 3. Architectural Components
### 3.1 Custom Language Model
- **Model Architecture**: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.
- **Configuration File**: Defined using Ollama’s ModelFile format (`model.yaml`), specifying base checkpoint, sampling parameters, and role-based prompt templates.
- **Artifact Packaging**: Converted to `.gguf` (GPT-Generated Unified Format) to facilitate efficient loading and inference.
``` bash
git clone https://github.com/mattjamo/OllamaToGGUF.git
cd OllamaToGGUF
python OllamaToGGUF.py
```
- **Repository Deployment**: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.
``` bash
huggingface-cli upload <your-username>/<your-model-name> . .
```
### 3.2 NVIDIA NeMo Guardrails
- **Function**: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.
- **Colang Files**: All `.co` artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:
- **User Message Block** (`define user ...`)
- **Flow Block** (`define flow ...`)
- **Bot Message Block** (`define bot ...`)
- **Directory Layout**:
```plaintext
config/
├── rails/ # Colang flow definitions (.co)
├── prompts.yml # Prompt templates and trigger mappings
├── config.yml # Guardrails engine settings and routing rules
└── actions.py # Custom callbacks for external services
```
### 3.3 Orchestration with n8n
* **Webhook Listener**: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
* **Policy Validation Node**: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
* **Inference Node**: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
* **Response Dispatcher**: Consolidates model outputs and returns them to clients in standardized JSON responses.
### 3.4 Open WebUI Front-End
* **UI Framework**: Based on the Open WebUI library, providing a reactive chat interface.
* **Features**:
* Real-time streaming of text and multimedia.
* Quick-reply button generation.
* Resilient error handling for network or validation interruptions.
---
## 4. Deployment Workflow
<!-- ### 4.1 Prerequisites
* Docker Engine & Docker Compose
* Node.js (v16+) and npm
* Python 3.10+ with `nemo-guardrails`
* Ollama CLI for model export
### 4.2 Model Preparation
1. **ModelFile Definition**: Create `model.yaml` with base model reference (`mistral-7b`), sampling hyperparameters, and role-based prompts.
2. **Model Conversion**:
```bash
ollama export mistral-7b --output model.gguf
```
3. **Artifact Publication**:
```bash
git clone https://huggingface.co/<org>/mistral-7b-gguf
cp model.gguf mistral-7b-gguf/
cd mistral-7b-gguf
git add model.gguf
git commit -m "JIRA-###: Add Mistral 7B gguf model"
git push
```
### 4.3 Guardrails Initialization
1. Construct the `config/` directory structure as outlined in Section 3.2.
2. Populate `rails/` with Colang `.co` definitions.
3. Install dependencies:
```bash
pip install nemo-guardrails
```
4. Launch the Guardrails engine:
```bash
guardrails run --config config/config.yml
```
### 4.4 n8n Orchestration Deployment
1. Place `chatbot.json` workflow definition in `n8n/workflows/`.
2. Start n8n via Docker Compose:
```bash
docker-compose up -d n8n
```
### 4.5 Front-End Deployment
```bash
cd open-webui
npm install
# Update API endpoint in config
npm run dev
``` -->
### 4.6 FastAPI Integration
Integrate the model and guardrails engine behind a FastAPI service:
```python
from pydantic import BaseModel
from nemoguardrails import RailsConfig, LLMRails
from fastapi import FastAPI
# FastAPI
app = FastAPI(title = "modelkai")
# Configuration of guardrails
config = RailsConfig.from_path("./config")
rails = LLMRails(config, verbose=True)
class ChatRequest(BaseModel):
message: str
@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
response = await rails.generate_async(
messages=[{"role": "user", "content": request.message}]
)
return {"response": response["content"]}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=5000)
```
<!-- ---
## 5. Operational Procedures
1. **Receive User Input**: Front-end transmits message to n8n.
2. **Enforce Policies**: Guardrails engine evaluates content; unsafe inputs invoke fallback dialogues.
3. **Generate Response**: Sanitized prompts are processed by the LLM inference endpoint.
4. **Deliver Output**: n8n returns the structured response to the client.
---
## 6. Maintenance and Diagnostics
* **Model Updates**: Re-export `.gguf` artifacts and update repository as per Section 4.2.
* **Guardrail Tuning**: Modify Colang `.co` definitions, test via CLI, and redeploy engine.
* **Workflow Monitoring**: Utilize n8n’s built-in analytics dashboard for node-level logs.
* **UI Troubleshooting**: Inspect browser developer console for errors and verify API endpoint configurations.
---
*Document generated based on source materials.*
```
-->