File size: 6,155 Bytes
bfc2180 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# AI Chatbot System Technical Documentation
---
## 1. Executive Summary
This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.
---
## 2. System Capabilities
- **Natural Language Understanding**: Implements advanced parsing to interpret user intents and entities.
- **Policy Enforcement**: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
- **Low-Latency Responses**: Achieves sub-second turnaround via event-based orchestration.
- **Modular Extensibility**: Supports pluggable integrations with external APIs, databases, and analytics pipelines.
---
## 3. Architectural Components
### 3.1 Custom Language Model
- **Model Architecture**: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.
- **Configuration File**: Defined using Ollama’s ModelFile format (`model.yaml`), specifying base checkpoint, sampling parameters, and role-based prompt templates.
- **Artifact Packaging**: Converted to `.gguf` (GPT-Generated Unified Format) to facilitate efficient loading and inference.
``` bash
git clone https://github.com/mattjamo/OllamaToGGUF.git
cd OllamaToGGUF
python OllamaToGGUF.py
```
- **Repository Deployment**: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.
``` bash
huggingface-cli upload <your-username>/<your-model-name> . .
```
### 3.2 NVIDIA NeMo Guardrails
- **Function**: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.
- **Colang Files**: All `.co` artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:
- **User Message Block** (`define user ...`)
- **Flow Block** (`define flow ...`)
- **Bot Message Block** (`define bot ...`)
- **Directory Layout**:
```plaintext
config/
├── rails/ # Colang flow definitions (.co)
├── prompts.yml # Prompt templates and trigger mappings
├── config.yml # Guardrails engine settings and routing rules
└── actions.py # Custom callbacks for external services
```
### 3.3 Orchestration with n8n
* **Webhook Listener**: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
* **Policy Validation Node**: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
* **Inference Node**: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
* **Response Dispatcher**: Consolidates model outputs and returns them to clients in standardized JSON responses.
### 3.4 Open WebUI Front-End
* **UI Framework**: Based on the Open WebUI library, providing a reactive chat interface.
* **Features**:
* Real-time streaming of text and multimedia.
* Quick-reply button generation.
* Resilient error handling for network or validation interruptions.
---
## 4. Deployment Workflow
<!-- ### 4.1 Prerequisites
* Docker Engine & Docker Compose
* Node.js (v16+) and npm
* Python 3.10+ with `nemo-guardrails`
* Ollama CLI for model export
### 4.2 Model Preparation
1. **ModelFile Definition**: Create `model.yaml` with base model reference (`mistral-7b`), sampling hyperparameters, and role-based prompts.
2. **Model Conversion**:
```bash
ollama export mistral-7b --output model.gguf
```
3. **Artifact Publication**:
```bash
git clone https://huggingface.co/<org>/mistral-7b-gguf
cp model.gguf mistral-7b-gguf/
cd mistral-7b-gguf
git add model.gguf
git commit -m "JIRA-###: Add Mistral 7B gguf model"
git push
```
### 4.3 Guardrails Initialization
1. Construct the `config/` directory structure as outlined in Section 3.2.
2. Populate `rails/` with Colang `.co` definitions.
3. Install dependencies:
```bash
pip install nemo-guardrails
```
4. Launch the Guardrails engine:
```bash
guardrails run --config config/config.yml
```
### 4.4 n8n Orchestration Deployment
1. Place `chatbot.json` workflow definition in `n8n/workflows/`.
2. Start n8n via Docker Compose:
```bash
docker-compose up -d n8n
```
### 4.5 Front-End Deployment
```bash
cd open-webui
npm install
# Update API endpoint in config
npm run dev
``` -->
### 4.6 FastAPI Integration
Integrate the model and guardrails engine behind a FastAPI service:
```python
from pydantic import BaseModel
from nemoguardrails import RailsConfig, LLMRails
from fastapi import FastAPI
# FastAPI
app = FastAPI(title = "modelkai")
# Configuration of guardrails
config = RailsConfig.from_path("./config")
rails = LLMRails(config, verbose=True)
class ChatRequest(BaseModel):
message: str
@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
response = await rails.generate_async(
messages=[{"role": "user", "content": request.message}]
)
return {"response": response["content"]}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=5000)
```
<!-- ---
## 5. Operational Procedures
1. **Receive User Input**: Front-end transmits message to n8n.
2. **Enforce Policies**: Guardrails engine evaluates content; unsafe inputs invoke fallback dialogues.
3. **Generate Response**: Sanitized prompts are processed by the LLM inference endpoint.
4. **Deliver Output**: n8n returns the structured response to the client.
---
## 6. Maintenance and Diagnostics
* **Model Updates**: Re-export `.gguf` artifacts and update repository as per Section 4.2.
* **Guardrail Tuning**: Modify Colang `.co` definitions, test via CLI, and redeploy engine.
* **Workflow Monitoring**: Utilize n8n’s built-in analytics dashboard for node-level logs.
* **UI Troubleshooting**: Inspect browser developer console for errors and verify API endpoint configurations.
---
*Document generated based on source materials.*
```
-->
|