|
|
|
# AI Chatbot System Technical Documentation |
|
|
|
--- |
|
|
|
## 1. Executive Summary |
|
|
|
This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability. |
|
|
|
--- |
|
|
|
## 2. System Capabilities |
|
|
|
- **Natural Language Understanding**: Implements advanced parsing to interpret user intents and entities. |
|
- **Policy Enforcement**: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements. |
|
- **Low-Latency Responses**: Achieves sub-second turnaround via event-based orchestration. |
|
- **Modular Extensibility**: Supports pluggable integrations with external APIs, databases, and analytics pipelines. |
|
|
|
--- |
|
|
|
## 3. Architectural Components |
|
|
|
### 3.1 Custom Language Model |
|
|
|
- **Model Architecture**: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks. |
|
- **Configuration File**: Defined using Ollama’s ModelFile format (`model.yaml`), specifying base checkpoint, sampling parameters, and role-based prompt templates. |
|
- **Artifact Packaging**: Converted to `.gguf` (GPT-Generated Unified Format) to facilitate efficient loading and inference. |
|
|
|
``` bash |
|
|
|
git clone https://github.com/mattjamo/OllamaToGGUF.git |
|
cd OllamaToGGUF |
|
python OllamaToGGUF.py |
|
|
|
``` |
|
|
|
- **Repository Deployment**: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking. |
|
|
|
``` bash |
|
|
|
huggingface-cli upload <your-username>/<your-model-name> . . |
|
|
|
``` |
|
|
|
### 3.2 NVIDIA NeMo Guardrails |
|
|
|
- **Function**: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues. |
|
- **Colang Files**: All `.co` artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are: |
|
- **User Message Block** (`define user ...`) |
|
- **Flow Block** (`define flow ...`) |
|
- **Bot Message Block** (`define bot ...`) |
|
- **Directory Layout**: |
|
|
|
```plaintext |
|
|
|
config/ |
|
├── rails/ # Colang flow definitions (.co) |
|
├── prompts.yml # Prompt templates and trigger mappings |
|
├── config.yml # Guardrails engine settings and routing rules |
|
└── actions.py # Custom callbacks for external services |
|
``` |
|
|
|
|
|
### 3.3 Orchestration with n8n |
|
|
|
* **Webhook Listener**: Exposes HTTP POST endpoint to receive JSON-formatted user queries. |
|
* **Policy Validation Node**: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions. |
|
* **Inference Node**: Forwards validated prompts to the Mistral 7B inference API and awaits generated output. |
|
* **Response Dispatcher**: Consolidates model outputs and returns them to clients in standardized JSON responses. |
|
|
|
### 3.4 Open WebUI Front-End |
|
|
|
* **UI Framework**: Based on the Open WebUI library, providing a reactive chat interface. |
|
* **Features**: |
|
|
|
* Real-time streaming of text and multimedia. |
|
* Quick-reply button generation. |
|
* Resilient error handling for network or validation interruptions. |
|
|
|
--- |
|
|
|
## 4. Deployment Workflow |
|
|
|
<!-- ### 4.1 Prerequisites |
|
|
|
* Docker Engine & Docker Compose |
|
* Node.js (v16+) and npm |
|
* Python 3.10+ with `nemo-guardrails` |
|
* Ollama CLI for model export |
|
|
|
### 4.2 Model Preparation |
|
|
|
1. **ModelFile Definition**: Create `model.yaml` with base model reference (`mistral-7b`), sampling hyperparameters, and role-based prompts. |
|
2. **Model Conversion**: |
|
|
|
```bash |
|
ollama export mistral-7b --output model.gguf |
|
``` |
|
3. **Artifact Publication**: |
|
|
|
```bash |
|
git clone https://huggingface.co/<org>/mistral-7b-gguf |
|
cp model.gguf mistral-7b-gguf/ |
|
cd mistral-7b-gguf |
|
git add model.gguf |
|
git commit -m "JIRA-###: Add Mistral 7B gguf model" |
|
git push |
|
``` |
|
|
|
### 4.3 Guardrails Initialization |
|
|
|
1. Construct the `config/` directory structure as outlined in Section 3.2. |
|
2. Populate `rails/` with Colang `.co` definitions. |
|
3. Install dependencies: |
|
|
|
```bash |
|
pip install nemo-guardrails |
|
``` |
|
4. Launch the Guardrails engine: |
|
|
|
```bash |
|
guardrails run --config config/config.yml |
|
``` |
|
|
|
### 4.4 n8n Orchestration Deployment |
|
|
|
1. Place `chatbot.json` workflow definition in `n8n/workflows/`. |
|
2. Start n8n via Docker Compose: |
|
|
|
```bash |
|
docker-compose up -d n8n |
|
``` |
|
|
|
### 4.5 Front-End Deployment |
|
|
|
```bash |
|
cd open-webui |
|
npm install |
|
# Update API endpoint in config |
|
npm run dev |
|
``` --> |
|
|
|
### 4.6 FastAPI Integration |
|
|
|
Integrate the model and guardrails engine behind a FastAPI service: |
|
|
|
```python |
|
from pydantic import BaseModel |
|
from nemoguardrails import RailsConfig, LLMRails |
|
from fastapi import FastAPI |
|
|
|
# FastAPI |
|
app = FastAPI(title = "modelkai") |
|
|
|
# Configuration of guardrails |
|
config = RailsConfig.from_path("./config") |
|
rails = LLMRails(config, verbose=True) |
|
|
|
class ChatRequest(BaseModel): |
|
message: str |
|
|
|
@app.post("/chat") |
|
async def chat_endpoint(request: ChatRequest): |
|
response = await rails.generate_async( |
|
messages=[{"role": "user", "content": request.message}] |
|
) |
|
return {"response": response["content"]} |
|
|
|
if __name__ == "__main__": |
|
import uvicorn |
|
uvicorn.run(app, host="0.0.0.0", port=5000) |
|
|
|
``` |
|
|
|
<!-- --- |
|
|
|
## 5. Operational Procedures |
|
|
|
1. **Receive User Input**: Front-end transmits message to n8n. |
|
2. **Enforce Policies**: Guardrails engine evaluates content; unsafe inputs invoke fallback dialogues. |
|
3. **Generate Response**: Sanitized prompts are processed by the LLM inference endpoint. |
|
4. **Deliver Output**: n8n returns the structured response to the client. |
|
|
|
--- |
|
|
|
## 6. Maintenance and Diagnostics |
|
|
|
* **Model Updates**: Re-export `.gguf` artifacts and update repository as per Section 4.2. |
|
* **Guardrail Tuning**: Modify Colang `.co` definitions, test via CLI, and redeploy engine. |
|
* **Workflow Monitoring**: Utilize n8n’s built-in analytics dashboard for node-level logs. |
|
* **UI Troubleshooting**: Inspect browser developer console for errors and verify API endpoint configurations. |
|
|
|
--- |
|
|
|
*Document generated based on source materials.* |
|
|
|
``` |
|
--> |
|
|