AI Chatbot System Technical Documentation
1. Executive Summary
This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.
2. System Capabilities
- Natural Language Understanding: Implements advanced parsing to interpret user intents and entities.
- Policy Enforcement: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
- Low-Latency Responses: Achieves sub-second turnaround via event-based orchestration.
- Modular Extensibility: Supports pluggable integrations with external APIs, databases, and analytics pipelines.
3. Architectural Components
3.1 Custom Language Model
Model Architecture: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.
Configuration File: Defined using Ollama’s ModelFile format (
model.yaml
), specifying base checkpoint, sampling parameters, and role-based prompt templates.Artifact Packaging: Converted to
.gguf
(GPT-Generated Unified Format) to facilitate efficient loading and inference.git clone https://github.com/mattjamo/OllamaToGGUF.git cd OllamaToGGUF python OllamaToGGUF.py
Repository Deployment: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.
huggingface-cli upload <your-username>/<your-model-name> . .
3.2 NVIDIA NeMo Guardrails
Function: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.
Colang Files: All
.co
artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:- User Message Block (
define user ...
) - Flow Block (
define flow ...
) - Bot Message Block (
define bot ...
)
- User Message Block (
Directory Layout:
config/ ├── rails/ # Colang flow definitions (.co) ├── prompts.yml # Prompt templates and trigger mappings ├── config.yml # Guardrails engine settings and routing rules └── actions.py # Custom callbacks for external services
3.3 Orchestration with n8n
- Webhook Listener: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
- Policy Validation Node: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
- Inference Node: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
- Response Dispatcher: Consolidates model outputs and returns them to clients in standardized JSON responses.
3.4 Open WebUI Front-End
UI Framework: Based on the Open WebUI library, providing a reactive chat interface.
Features:
- Real-time streaming of text and multimedia.
- Quick-reply button generation.
- Resilient error handling for network or validation interruptions.
4. Deployment Workflow
4.6 FastAPI Integration
Integrate the model and guardrails engine behind a FastAPI service:
from pydantic import BaseModel
from nemoguardrails import RailsConfig, LLMRails
from fastapi import FastAPI
# FastAPI
app = FastAPI(title = "modelkai")
# Configuration of guardrails
config = RailsConfig.from_path("./config")
rails = LLMRails(config, verbose=True)
class ChatRequest(BaseModel):
message: str
@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
response = await rails.generate_async(
messages=[{"role": "user", "content": request.message}]
)
return {"response": response["content"]}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=5000)