my-kai-model / Docs.md

SHA-529; upload documentation to Github

bfc2180 2 months ago

6.16 kB


	# AI Chatbot System Technical Documentation

	---

	## 1. Executive Summary

	This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.

	---

	## 2. System Capabilities

	- Natural Language Understanding: Implements advanced parsing to interpret user intents and entities.
	- Policy Enforcement: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
	- Low-Latency Responses: Achieves sub-second turnaround via event-based orchestration.
	- Modular Extensibility: Supports pluggable integrations with external APIs, databases, and analytics pipelines.

	---

	## 3. Architectural Components

	### 3.1 Custom Language Model

	- Model Architecture: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.
	- Configuration File: Defined using Ollama’s ModelFile format (`model.yaml`), specifying base checkpoint, sampling parameters, and role-based prompt templates.
	- Artifact Packaging: Converted to `.gguf` (GPT-Generated Unified Format) to facilitate efficient loading and inference.

	``` bash

	git clone https://github.com/mattjamo/OllamaToGGUF.git
	cd OllamaToGGUF
	python OllamaToGGUF.py

	```

	- Repository Deployment: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.

	``` bash

	huggingface-cli upload <your-username>/<your-model-name> . .

	```

	### 3.2 NVIDIA NeMo Guardrails

	- Function: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.
	- Colang Files: All `.co` artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:
	- User Message Block (`define user ...`)
	- Flow Block (`define flow ...`)
	- Bot Message Block (`define bot ...`)
	- Directory Layout:

	```plaintext

	config/
	├── rails/ # Colang flow definitions (.co)
	├── prompts.yml # Prompt templates and trigger mappings
	├── config.yml # Guardrails engine settings and routing rules
	└── actions.py # Custom callbacks for external services
	```


	### 3.3 Orchestration with n8n

	* Webhook Listener: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
	* Policy Validation Node: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
	* Inference Node: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
	* Response Dispatcher: Consolidates model outputs and returns them to clients in standardized JSON responses.

	### 3.4 Open WebUI Front-End

	* UI Framework: Based on the Open WebUI library, providing a reactive chat interface.
	* Features:

	* Real-time streaming of text and multimedia.
	* Quick-reply button generation.
	* Resilient error handling for network or validation interruptions.

	---

	## 4. Deployment Workflow

	<!-- ### 4.1 Prerequisites

	* Docker Engine & Docker Compose
	* Node.js (v16+) and npm
	* Python 3.10+ with `nemo-guardrails`
	* Ollama CLI for model export

	### 4.2 Model Preparation

	1. ModelFile Definition: Create `model.yaml` with base model reference (`mistral-7b`), sampling hyperparameters, and role-based prompts.
	2. Model Conversion:

	```bash
	ollama export mistral-7b --output model.gguf
	```
	3. Artifact Publication:

	```bash
	git clone https://huggingface.co/<org>/mistral-7b-gguf
	cp model.gguf mistral-7b-gguf/
	cd mistral-7b-gguf
	git add model.gguf
	git commit -m "JIRA-###: Add Mistral 7B gguf model"
	git push
	```

	### 4.3 Guardrails Initialization

	1. Construct the `config/` directory structure as outlined in Section 3.2.
	2. Populate `rails/` with Colang `.co` definitions.
	3. Install dependencies:

	```bash
	pip install nemo-guardrails
	```
	4. Launch the Guardrails engine:

	```bash
	guardrails run --config config/config.yml
	```

	### 4.4 n8n Orchestration Deployment

	1. Place `chatbot.json` workflow definition in `n8n/workflows/`.
	2. Start n8n via Docker Compose:

	```bash
	docker-compose up -d n8n
	```

	### 4.5 Front-End Deployment

	```bash
	cd open-webui
	npm install
	# Update API endpoint in config
	npm run dev
	``` -->

	### 4.6 FastAPI Integration

	Integrate the model and guardrails engine behind a FastAPI service:

	```python
	from pydantic import BaseModel
	from nemoguardrails import RailsConfig, LLMRails
	from fastapi import FastAPI

	# FastAPI
	app = FastAPI(title = "modelkai")

	# Configuration of guardrails
	config = RailsConfig.from_path("./config")
	rails = LLMRails(config, verbose=True)

	class ChatRequest(BaseModel):
	message: str

	@app.post("/chat")
	async def chat_endpoint(request: ChatRequest):
	response = await rails.generate_async(
	messages=[{"role": "user", "content": request.message}]
	)
	return {"response": response["content"]}

	if __name__ == "__main__":
	import uvicorn
	uvicorn.run(app, host="0.0.0.0", port=5000)

	```

	<!-- ---

	## 5. Operational Procedures

	1. Receive User Input: Front-end transmits message to n8n.
	2. Enforce Policies: Guardrails engine evaluates content; unsafe inputs invoke fallback dialogues.
	3. Generate Response: Sanitized prompts are processed by the LLM inference endpoint.
	4. Deliver Output: n8n returns the structured response to the client.

	---

	## 6. Maintenance and Diagnostics

	* Model Updates: Re-export `.gguf` artifacts and update repository as per Section 4.2.
	* Guardrail Tuning: Modify Colang `.co` definitions, test via CLI, and redeploy engine.
	* Workflow Monitoring: Utilize n8n’s built-in analytics dashboard for node-level logs.
	* UI Troubleshooting: Inspect browser developer console for errors and verify API endpoint configurations.

	---

	Document generated based on source materials.

	```
	-->