Upload folder using huggingface_hub
Browse files- .gitignore +4 -4
- Readme.md +198 -0
- docker-compose.yml +33 -0
.gitignore
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
myvenv/
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
*.
|
|
|
1 |
myvenv/
|
2 |
+
data/
|
3 |
+
__pycache__/
|
4 |
+
*.gguf
|
5 |
+
*.ipynb
|
Readme.md
ADDED
@@ -0,0 +1,198 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# AI Chatbot System Technical Documentation
|
3 |
+
|
4 |
+
---
|
5 |
+
|
6 |
+
## 1. Executive Summary
|
7 |
+
|
8 |
+
This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.
|
9 |
+
|
10 |
+
---
|
11 |
+
|
12 |
+
## 2. System Capabilities
|
13 |
+
|
14 |
+
- **Natural Language Understanding**: Implements advanced parsing to interpret user intents and entities.
|
15 |
+
- **Policy Enforcement**: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
|
16 |
+
- **Low-Latency Responses**: Achieves sub-second turnaround via event-based orchestration.
|
17 |
+
- **Modular Extensibility**: Supports pluggable integrations with external APIs, databases, and analytics pipelines.
|
18 |
+
|
19 |
+
---
|
20 |
+
|
21 |
+
## 3. Architectural Components
|
22 |
+
|
23 |
+
### 3.1 Custom Language Model
|
24 |
+
|
25 |
+
- **Model Architecture**: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.
|
26 |
+
- **Configuration File**: Defined using Ollama’s ModelFile format (`model.yaml`), specifying base checkpoint, sampling parameters, and role-based prompt templates.
|
27 |
+
- **Artifact Packaging**: Converted to `.gguf` (GPT-Generated Unified Format) to facilitate efficient loading and inference.
|
28 |
+
|
29 |
+
``` bash
|
30 |
+
|
31 |
+
git clone https://github.com/mattjamo/OllamaToGGUF.git
|
32 |
+
cd OllamaToGGUF
|
33 |
+
python OllamaToGGUF.py
|
34 |
+
|
35 |
+
```
|
36 |
+
|
37 |
+
- **Repository Deployment**: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.
|
38 |
+
|
39 |
+
``` bash
|
40 |
+
|
41 |
+
huggingface-cli upload <your-username>/<your-model-name> . .
|
42 |
+
|
43 |
+
```
|
44 |
+
|
45 |
+
### 3.2 NVIDIA NeMo Guardrails
|
46 |
+
|
47 |
+
- **Function**: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.
|
48 |
+
- **Colang Files**: All `.co` artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:
|
49 |
+
- **User Message Block** (`define user ...`)
|
50 |
+
- **Flow Block** (`define flow ...`)
|
51 |
+
- **Bot Message Block** (`define bot ...`)
|
52 |
+
- **Directory Layout**:
|
53 |
+
|
54 |
+
```plaintext
|
55 |
+
|
56 |
+
config/
|
57 |
+
├── rails/ # Colang flow definitions (.co)
|
58 |
+
├── prompts.yml # Prompt templates and trigger mappings
|
59 |
+
├── config.yml # Guardrails engine settings and routing rules
|
60 |
+
└── actions.py # Custom callbacks for external services
|
61 |
+
```
|
62 |
+
|
63 |
+
|
64 |
+
### 3.3 Orchestration with n8n
|
65 |
+
|
66 |
+
* **Webhook Listener**: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
|
67 |
+
* **Policy Validation Node**: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
|
68 |
+
* **Inference Node**: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
|
69 |
+
* **Response Dispatcher**: Consolidates model outputs and returns them to clients in standardized JSON responses.
|
70 |
+
|
71 |
+
### 3.4 Open WebUI Front-End
|
72 |
+
|
73 |
+
* **UI Framework**: Based on the Open WebUI library, providing a reactive chat interface.
|
74 |
+
* **Features**:
|
75 |
+
|
76 |
+
* Real-time streaming of text and multimedia.
|
77 |
+
* Quick-reply button generation.
|
78 |
+
* Resilient error handling for network or validation interruptions.
|
79 |
+
|
80 |
+
---
|
81 |
+
|
82 |
+
## 4. Deployment Workflow
|
83 |
+
|
84 |
+
<!-- ### 4.1 Prerequisites
|
85 |
+
|
86 |
+
* Docker Engine & Docker Compose
|
87 |
+
* Node.js (v16+) and npm
|
88 |
+
* Python 3.10+ with `nemo-guardrails`
|
89 |
+
* Ollama CLI for model export
|
90 |
+
|
91 |
+
### 4.2 Model Preparation
|
92 |
+
|
93 |
+
1. **ModelFile Definition**: Create `model.yaml` with base model reference (`mistral-7b`), sampling hyperparameters, and role-based prompts.
|
94 |
+
2. **Model Conversion**:
|
95 |
+
|
96 |
+
```bash
|
97 |
+
ollama export mistral-7b --output model.gguf
|
98 |
+
```
|
99 |
+
3. **Artifact Publication**:
|
100 |
+
|
101 |
+
```bash
|
102 |
+
git clone https://huggingface.co/<org>/mistral-7b-gguf
|
103 |
+
cp model.gguf mistral-7b-gguf/
|
104 |
+
cd mistral-7b-gguf
|
105 |
+
git add model.gguf
|
106 |
+
git commit -m "JIRA-###: Add Mistral 7B gguf model"
|
107 |
+
git push
|
108 |
+
```
|
109 |
+
|
110 |
+
### 4.3 Guardrails Initialization
|
111 |
+
|
112 |
+
1. Construct the `config/` directory structure as outlined in Section 3.2.
|
113 |
+
2. Populate `rails/` with Colang `.co` definitions.
|
114 |
+
3. Install dependencies:
|
115 |
+
|
116 |
+
```bash
|
117 |
+
pip install nemo-guardrails
|
118 |
+
```
|
119 |
+
4. Launch the Guardrails engine:
|
120 |
+
|
121 |
+
```bash
|
122 |
+
guardrails run --config config/config.yml
|
123 |
+
```
|
124 |
+
|
125 |
+
### 4.4 n8n Orchestration Deployment
|
126 |
+
|
127 |
+
1. Place `chatbot.json` workflow definition in `n8n/workflows/`.
|
128 |
+
2. Start n8n via Docker Compose:
|
129 |
+
|
130 |
+
```bash
|
131 |
+
docker-compose up -d n8n
|
132 |
+
```
|
133 |
+
|
134 |
+
### 4.5 Front-End Deployment
|
135 |
+
|
136 |
+
```bash
|
137 |
+
cd open-webui
|
138 |
+
npm install
|
139 |
+
# Update API endpoint in config
|
140 |
+
npm run dev
|
141 |
+
``` -->
|
142 |
+
|
143 |
+
### 4.6 FastAPI Integration
|
144 |
+
|
145 |
+
Integrate the model and guardrails engine behind a FastAPI service:
|
146 |
+
|
147 |
+
```python
|
148 |
+
from pydantic import BaseModel
|
149 |
+
from nemoguardrails import RailsConfig, LLMRails
|
150 |
+
from fastapi import FastAPI
|
151 |
+
|
152 |
+
# FastAPI
|
153 |
+
app = FastAPI(title = "modelkai")
|
154 |
+
|
155 |
+
# Configuration of guardrails
|
156 |
+
config = RailsConfig.from_path("./config")
|
157 |
+
rails = LLMRails(config, verbose=True)
|
158 |
+
|
159 |
+
class ChatRequest(BaseModel):
|
160 |
+
message: str
|
161 |
+
|
162 |
+
@app.post("/chat")
|
163 |
+
async def chat_endpoint(request: ChatRequest):
|
164 |
+
response = await rails.generate_async(
|
165 |
+
messages=[{"role": "user", "content": request.message}]
|
166 |
+
)
|
167 |
+
return {"response": response["content"]}
|
168 |
+
|
169 |
+
if __name__ == "__main__":
|
170 |
+
import uvicorn
|
171 |
+
uvicorn.run(app, host="0.0.0.0", port=5000)
|
172 |
+
|
173 |
+
```
|
174 |
+
|
175 |
+
<!-- ---
|
176 |
+
|
177 |
+
## 5. Operational Procedures
|
178 |
+
|
179 |
+
1. **Receive User Input**: Front-end transmits message to n8n.
|
180 |
+
2. **Enforce Policies**: Guardrails engine evaluates content; unsafe inputs invoke fallback dialogues.
|
181 |
+
3. **Generate Response**: Sanitized prompts are processed by the LLM inference endpoint.
|
182 |
+
4. **Deliver Output**: n8n returns the structured response to the client.
|
183 |
+
|
184 |
+
---
|
185 |
+
|
186 |
+
## 6. Maintenance and Diagnostics
|
187 |
+
|
188 |
+
* **Model Updates**: Re-export `.gguf` artifacts and update repository as per Section 4.2.
|
189 |
+
* **Guardrail Tuning**: Modify Colang `.co` definitions, test via CLI, and redeploy engine.
|
190 |
+
* **Workflow Monitoring**: Utilize n8n’s built-in analytics dashboard for node-level logs.
|
191 |
+
* **UI Troubleshooting**: Inspect browser developer console for errors and verify API endpoint configurations.
|
192 |
+
|
193 |
+
---
|
194 |
+
|
195 |
+
*Document generated based on source materials.*
|
196 |
+
|
197 |
+
```
|
198 |
+
-->
|
docker-compose.yml
CHANGED
@@ -11,3 +11,36 @@ services:
|
|
11 |
- "5678:5678"
|
12 |
depends_on:
|
13 |
- api
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
- "5678:5678"
|
12 |
depends_on:
|
13 |
- api
|
14 |
+
environment:
|
15 |
+
- N8N_SECURE_COOKIE=false
|
16 |
+
- N8N_PROTOCOL=http
|
17 |
+
- N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=false
|
18 |
+
- DB_POSTGRESDB_PASSWORD=dbpass
|
19 |
+
- N8N_OWNER_EMAIL=[email protected]
|
20 |
+
- N8N_OWNER_PASSWORD=yourStrongPassword
|
21 |
+
- N8N_ENCRYPTION_KEY=yourEncryptionKey
|
22 |
+
|
23 |
+
|
24 |
+
openweb:
|
25 |
+
image: ghcr.io/open-webui/open-webui:main
|
26 |
+
container_name: open-webui
|
27 |
+
ports:
|
28 |
+
- "3000:8080"
|
29 |
+
volumes:
|
30 |
+
- openwebui_data:/app/backend/data
|
31 |
+
environment:
|
32 |
+
# Disable multi-user login (optional)
|
33 |
+
- WEBUI_AUTH=False
|
34 |
+
# If you want Open WebUI to hit your FastAPI or n8n endpoints,
|
35 |
+
# you can point it here, e.g.:
|
36 |
+
# - API_BASE_URL=http://fastapi:8000
|
37 |
+
depends_on:
|
38 |
+
- api
|
39 |
+
- n8n
|
40 |
+
|
41 |
+
volumes:
|
42 |
+
openwebui_data:
|
43 |
+
|
44 |
+
networks:
|
45 |
+
default:
|
46 |
+
driver: bridge
|