AI Chatbot System Technical Documentation

1. Executive Summary

This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.

2. System Capabilities

Natural Language Understanding: Implements advanced parsing to interpret user intents and entities.
Policy Enforcement: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
Low-Latency Responses: Achieves sub-second turnaround via event-based orchestration.
Modular Extensibility: Supports pluggable integrations with external APIs, databases, and analytics pipelines.

3. Architectural Components

3.1 Custom Language Model

Model Architecture: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.
Configuration File: Defined using Ollama’s ModelFile format (model.yaml), specifying base checkpoint, sampling parameters, and role-based prompt templates.

Artifact Packaging: Converted to .gguf (GPT-Generated Unified Format) to facilitate efficient loading and inference.


   git clone https://github.com/mattjamo/OllamaToGGUF.git
   cd OllamaToGGUF
   python OllamaToGGUF.py

Repository Deployment: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.
```
   huggingface-cli upload <your-username>/<your-model-name> . .
```

3.2 NVIDIA NeMo Guardrails

Function: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.
Colang Files: All .co artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:
- User Message Block (define user ...)
- Flow Block (define flow ...)
- Bot Message Block (define bot ...)

Directory Layout:


config/
├── rails/          # Colang flow definitions (.co)
├── prompts.yml     # Prompt templates and trigger mappings
├── config.yml      # Guardrails engine settings and routing rules
└── actions.py      # Custom callbacks for external services

3.3 Orchestration with n8n

Webhook Listener: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
Policy Validation Node: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
Inference Node: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
Response Dispatcher: Consolidates model outputs and returns them to clients in standardized JSON responses.

3.4 Open WebUI Front-End

UI Framework: Based on the Open WebUI library, providing a reactive chat interface.
Features:
- Real-time streaming of text and multimedia.
- Quick-reply button generation.
- Resilient error handling for network or validation interruptions.

4. Deployment Workflow

4.6 FastAPI Integration

Integrate the model and guardrails engine behind a FastAPI service:

from pydantic import BaseModel
from nemoguardrails import RailsConfig, LLMRails
from fastapi import FastAPI

# FastAPI
app = FastAPI(title = "modelkai")

# Configuration of guardrails
config = RailsConfig.from_path("./config")
rails = LLMRails(config, verbose=True)

class ChatRequest(BaseModel):
    message: str

@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
    response = await rails.generate_async(
        messages=[{"role": "user", "content": request.message}]
    )
    return {"response": response["content"]}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=5000)