rmtlabs commited on
Commit
3f9ef9d
·
verified ·
1 Parent(s): 4c246eb

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitignore +4 -4
  2. Readme.md +198 -0
  3. docker-compose.yml +33 -0
.gitignore CHANGED
@@ -1,5 +1,5 @@
1
  myvenv/
2
- *.ipynb
3
- **/__pycache__/
4
- **/myvenv/
5
- *.md
 
1
  myvenv/
2
+ data/
3
+ __pycache__/
4
+ *.gguf
5
+ *.ipynb
Readme.md ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # AI Chatbot System Technical Documentation
3
+
4
+ ---
5
+
6
+ ## 1. Executive Summary
7
+
8
+ This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.
9
+
10
+ ---
11
+
12
+ ## 2. System Capabilities
13
+
14
+ - **Natural Language Understanding**: Implements advanced parsing to interpret user intents and entities.
15
+ - **Policy Enforcement**: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
16
+ - **Low-Latency Responses**: Achieves sub-second turnaround via event-based orchestration.
17
+ - **Modular Extensibility**: Supports pluggable integrations with external APIs, databases, and analytics pipelines.
18
+
19
+ ---
20
+
21
+ ## 3. Architectural Components
22
+
23
+ ### 3.1 Custom Language Model
24
+
25
+ - **Model Architecture**: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.
26
+ - **Configuration File**: Defined using Ollama’s ModelFile format (`model.yaml`), specifying base checkpoint, sampling parameters, and role-based prompt templates.
27
+ - **Artifact Packaging**: Converted to `.gguf` (GPT-Generated Unified Format) to facilitate efficient loading and inference.
28
+
29
+ ``` bash
30
+
31
+ git clone https://github.com/mattjamo/OllamaToGGUF.git
32
+ cd OllamaToGGUF
33
+ python OllamaToGGUF.py
34
+
35
+ ```
36
+
37
+ - **Repository Deployment**: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.
38
+
39
+ ``` bash
40
+
41
+ huggingface-cli upload <your-username>/<your-model-name> . .
42
+
43
+ ```
44
+
45
+ ### 3.2 NVIDIA NeMo Guardrails
46
+
47
+ - **Function**: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.
48
+ - **Colang Files**: All `.co` artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:
49
+ - **User Message Block** (`define user ...`)
50
+ - **Flow Block** (`define flow ...`)
51
+ - **Bot Message Block** (`define bot ...`)
52
+ - **Directory Layout**:
53
+
54
+ ```plaintext
55
+
56
+ config/
57
+ ├── rails/ # Colang flow definitions (.co)
58
+ ├── prompts.yml # Prompt templates and trigger mappings
59
+ ├── config.yml # Guardrails engine settings and routing rules
60
+ └── actions.py # Custom callbacks for external services
61
+ ```
62
+
63
+
64
+ ### 3.3 Orchestration with n8n
65
+
66
+ * **Webhook Listener**: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
67
+ * **Policy Validation Node**: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
68
+ * **Inference Node**: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
69
+ * **Response Dispatcher**: Consolidates model outputs and returns them to clients in standardized JSON responses.
70
+
71
+ ### 3.4 Open WebUI Front-End
72
+
73
+ * **UI Framework**: Based on the Open WebUI library, providing a reactive chat interface.
74
+ * **Features**:
75
+
76
+ * Real-time streaming of text and multimedia.
77
+ * Quick-reply button generation.
78
+ * Resilient error handling for network or validation interruptions.
79
+
80
+ ---
81
+
82
+ ## 4. Deployment Workflow
83
+
84
+ <!-- ### 4.1 Prerequisites
85
+
86
+ * Docker Engine & Docker Compose
87
+ * Node.js (v16+) and npm
88
+ * Python 3.10+ with `nemo-guardrails`
89
+ * Ollama CLI for model export
90
+
91
+ ### 4.2 Model Preparation
92
+
93
+ 1. **ModelFile Definition**: Create `model.yaml` with base model reference (`mistral-7b`), sampling hyperparameters, and role-based prompts.
94
+ 2. **Model Conversion**:
95
+
96
+ ```bash
97
+ ollama export mistral-7b --output model.gguf
98
+ ```
99
+ 3. **Artifact Publication**:
100
+
101
+ ```bash
102
+ git clone https://huggingface.co/<org>/mistral-7b-gguf
103
+ cp model.gguf mistral-7b-gguf/
104
+ cd mistral-7b-gguf
105
+ git add model.gguf
106
+ git commit -m "JIRA-###: Add Mistral 7B gguf model"
107
+ git push
108
+ ```
109
+
110
+ ### 4.3 Guardrails Initialization
111
+
112
+ 1. Construct the `config/` directory structure as outlined in Section 3.2.
113
+ 2. Populate `rails/` with Colang `.co` definitions.
114
+ 3. Install dependencies:
115
+
116
+ ```bash
117
+ pip install nemo-guardrails
118
+ ```
119
+ 4. Launch the Guardrails engine:
120
+
121
+ ```bash
122
+ guardrails run --config config/config.yml
123
+ ```
124
+
125
+ ### 4.4 n8n Orchestration Deployment
126
+
127
+ 1. Place `chatbot.json` workflow definition in `n8n/workflows/`.
128
+ 2. Start n8n via Docker Compose:
129
+
130
+ ```bash
131
+ docker-compose up -d n8n
132
+ ```
133
+
134
+ ### 4.5 Front-End Deployment
135
+
136
+ ```bash
137
+ cd open-webui
138
+ npm install
139
+ # Update API endpoint in config
140
+ npm run dev
141
+ ``` -->
142
+
143
+ ### 4.6 FastAPI Integration
144
+
145
+ Integrate the model and guardrails engine behind a FastAPI service:
146
+
147
+ ```python
148
+ from pydantic import BaseModel
149
+ from nemoguardrails import RailsConfig, LLMRails
150
+ from fastapi import FastAPI
151
+
152
+ # FastAPI
153
+ app = FastAPI(title = "modelkai")
154
+
155
+ # Configuration of guardrails
156
+ config = RailsConfig.from_path("./config")
157
+ rails = LLMRails(config, verbose=True)
158
+
159
+ class ChatRequest(BaseModel):
160
+ message: str
161
+
162
+ @app.post("/chat")
163
+ async def chat_endpoint(request: ChatRequest):
164
+ response = await rails.generate_async(
165
+ messages=[{"role": "user", "content": request.message}]
166
+ )
167
+ return {"response": response["content"]}
168
+
169
+ if __name__ == "__main__":
170
+ import uvicorn
171
+ uvicorn.run(app, host="0.0.0.0", port=5000)
172
+
173
+ ```
174
+
175
+ <!-- ---
176
+
177
+ ## 5. Operational Procedures
178
+
179
+ 1. **Receive User Input**: Front-end transmits message to n8n.
180
+ 2. **Enforce Policies**: Guardrails engine evaluates content; unsafe inputs invoke fallback dialogues.
181
+ 3. **Generate Response**: Sanitized prompts are processed by the LLM inference endpoint.
182
+ 4. **Deliver Output**: n8n returns the structured response to the client.
183
+
184
+ ---
185
+
186
+ ## 6. Maintenance and Diagnostics
187
+
188
+ * **Model Updates**: Re-export `.gguf` artifacts and update repository as per Section 4.2.
189
+ * **Guardrail Tuning**: Modify Colang `.co` definitions, test via CLI, and redeploy engine.
190
+ * **Workflow Monitoring**: Utilize n8n’s built-in analytics dashboard for node-level logs.
191
+ * **UI Troubleshooting**: Inspect browser developer console for errors and verify API endpoint configurations.
192
+
193
+ ---
194
+
195
+ *Document generated based on source materials.*
196
+
197
+ ```
198
+ -->
docker-compose.yml CHANGED
@@ -11,3 +11,36 @@ services:
11
  - "5678:5678"
12
  depends_on:
13
  - api
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - "5678:5678"
12
  depends_on:
13
  - api
14
+ environment:
15
+ - N8N_SECURE_COOKIE=false
16
+ - N8N_PROTOCOL=http
17
+ - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=false
18
+ - DB_POSTGRESDB_PASSWORD=dbpass
19
+ - N8N_OWNER_EMAIL=[email protected]
20
+ - N8N_OWNER_PASSWORD=yourStrongPassword
21
+ - N8N_ENCRYPTION_KEY=yourEncryptionKey
22
+
23
+
24
+ openweb:
25
+ image: ghcr.io/open-webui/open-webui:main
26
+ container_name: open-webui
27
+ ports:
28
+ - "3000:8080"
29
+ volumes:
30
+ - openwebui_data:/app/backend/data
31
+ environment:
32
+ # Disable multi-user login (optional)
33
+ - WEBUI_AUTH=False
34
+ # If you want Open WebUI to hit your FastAPI or n8n endpoints,
35
+ # you can point it here, e.g.:
36
+ # - API_BASE_URL=http://fastapi:8000
37
+ depends_on:
38
+ - api
39
+ - n8n
40
+
41
+ volumes:
42
+ openwebui_data:
43
+
44
+ networks:
45
+ default:
46
+ driver: bridge