File size: 6,155 Bytes
bfc2180
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199

# AI Chatbot System Technical Documentation

---

## 1. Executive Summary

This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.

---

## 2. System Capabilities

- **Natural Language Understanding**: Implements advanced parsing to interpret user intents and entities.
- **Policy Enforcement**: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
- **Low-Latency Responses**: Achieves sub-second turnaround via event-based orchestration.
- **Modular Extensibility**: Supports pluggable integrations with external APIs, databases, and analytics pipelines.

---

## 3. Architectural Components

### 3.1 Custom Language Model

- **Model Architecture**: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.
- **Configuration File**: Defined using Ollama’s ModelFile format (`model.yaml`), specifying base checkpoint, sampling parameters, and role-based prompt templates.
- **Artifact Packaging**: Converted to `.gguf` (GPT-Generated Unified Format) to facilitate efficient loading and inference.

   ``` bash
   
      git clone https://github.com/mattjamo/OllamaToGGUF.git
      cd OllamaToGGUF
      python OllamaToGGUF.py

   ```

- **Repository Deployment**: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.

   ``` bash
   
      huggingface-cli upload <your-username>/<your-model-name> . .
   
   ```

### 3.2 NVIDIA NeMo Guardrails

- **Function**: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.
- **Colang Files**: All `.co` artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:
  - **User Message Block** (`define user ...`)
  - **Flow Block** (`define flow ...`)
  - **Bot Message Block** (`define bot ...`)
- **Directory Layout**:

  ```plaintext
  
  config/
  ├── rails/          # Colang flow definitions (.co)
  ├── prompts.yml     # Prompt templates and trigger mappings
  ├── config.yml      # Guardrails engine settings and routing rules
  └── actions.py      # Custom callbacks for external services
   ```


### 3.3 Orchestration with n8n

* **Webhook Listener**: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
* **Policy Validation Node**: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
* **Inference Node**: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
* **Response Dispatcher**: Consolidates model outputs and returns them to clients in standardized JSON responses.

### 3.4 Open WebUI Front-End

* **UI Framework**: Based on the Open WebUI library, providing a reactive chat interface.
* **Features**:

  * Real-time streaming of text and multimedia.
  * Quick-reply button generation.
  * Resilient error handling for network or validation interruptions.

---

## 4. Deployment Workflow

<!-- ### 4.1 Prerequisites

* Docker Engine & Docker Compose
* Node.js (v16+) and npm
* Python 3.10+ with `nemo-guardrails`
* Ollama CLI for model export

### 4.2 Model Preparation

1. **ModelFile Definition**: Create `model.yaml` with base model reference (`mistral-7b`), sampling hyperparameters, and role-based prompts.
2. **Model Conversion**:

   ```bash
   ollama export mistral-7b --output model.gguf
   ```
3. **Artifact Publication**:

   ```bash
   git clone https://huggingface.co/<org>/mistral-7b-gguf
   cp model.gguf mistral-7b-gguf/
   cd mistral-7b-gguf
   git add model.gguf
   git commit -m "JIRA-###: Add Mistral 7B gguf model"
   git push
   ```

### 4.3 Guardrails Initialization

1. Construct the `config/` directory structure as outlined in Section 3.2.
2. Populate `rails/` with Colang `.co` definitions.
3. Install dependencies:

   ```bash
   pip install nemo-guardrails
   ```
4. Launch the Guardrails engine:

   ```bash
   guardrails run --config config/config.yml
   ```

### 4.4 n8n Orchestration Deployment

1. Place `chatbot.json` workflow definition in `n8n/workflows/`.
2. Start n8n via Docker Compose:

   ```bash
   docker-compose up -d n8n
   ```

### 4.5 Front-End Deployment

```bash
cd open-webui
npm install
# Update API endpoint in config
npm run dev
``` -->

### 4.6 FastAPI Integration

Integrate the model and guardrails engine behind a FastAPI service:

```python
from pydantic import BaseModel
from nemoguardrails import RailsConfig, LLMRails
from fastapi import FastAPI

# FastAPI
app = FastAPI(title = "modelkai")

# Configuration of guardrails
config = RailsConfig.from_path("./config")
rails = LLMRails(config, verbose=True)

class ChatRequest(BaseModel):
    message: str

@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
    response = await rails.generate_async(
        messages=[{"role": "user", "content": request.message}]
    )
    return {"response": response["content"]}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=5000)

```

<!-- ---

## 5. Operational Procedures

1. **Receive User Input**: Front-end transmits message to n8n.
2. **Enforce Policies**: Guardrails engine evaluates content; unsafe inputs invoke fallback dialogues.
3. **Generate Response**: Sanitized prompts are processed by the LLM inference endpoint.
4. **Deliver Output**: n8n returns the structured response to the client.

---

## 6. Maintenance and Diagnostics

* **Model Updates**: Re-export `.gguf` artifacts and update repository as per Section 4.2.
* **Guardrail Tuning**: Modify Colang `.co` definitions, test via CLI, and redeploy engine.
* **Workflow Monitoring**: Utilize n8n’s built-in analytics dashboard for node-level logs.
* **UI Troubleshooting**: Inspect browser developer console for errors and verify API endpoint configurations.

---

*Document generated based on source materials.*

```
 -->