my-kai-model / Readme.md
aferrmt's picture
SHA-529; documentation
363d883

AI Chatbot System Technical Documentation


1. Executive Summary

This document specifies the architecture, operational components, and deployment workflow for the AI-driven chatbot system. It is intended for engineering teams responsible for system integration, maintenance, and scalability.


2. System Capabilities

  • Natural Language Understanding: Implements advanced parsing to interpret user intents and entities.
  • Policy Enforcement: Utilizes Colang-defined guardrails to ensure compliance with domain-specific and safety requirements.
  • Low-Latency Responses: Achieves sub-second turnaround via event-based orchestration.
  • Modular Extensibility: Supports pluggable integrations with external APIs, databases, and analytics pipelines.

3. Architectural Components

3.1 Custom Language Model

  • Model Architecture: Fine-tuned Mistral 7B large language model, optimized for dialogue tasks.

  • Configuration File: Defined using Ollama’s ModelFile format (model.yaml), specifying base checkpoint, sampling parameters, and role-based prompt templates.

  • Artifact Packaging: Converted to .gguf (GPT-Generated Unified Format) to facilitate efficient loading and inference.

    
       git clone https://github.com/mattjamo/OllamaToGGUF.git
       cd OllamaToGGUF
       python OllamaToGGUF.py
    
  • Repository Deployment: Published to Hugging Face Model Hub via automated CLI processes, with commit metadata linked to JIRA issue tracking.

    
       huggingface-cli upload <your-username>/<your-model-name> . .
    

3.2 NVIDIA NeMo Guardrails

  • Function: Applies programmable constraints to user-system interactions to enforce safe and contextually appropriate dialogues.

  • Colang Files: All .co artifacts define the Colang modeling language syntax, including blocks, statements, expressions, keywords, and variables. The primary block types are:

    • User Message Block (define user ...)
    • Flow Block (define flow ...)
    • Bot Message Block (define bot ...)
  • Directory Layout:

    
    config/
    ├── rails/          # Colang flow definitions (.co)
    ├── prompts.yml     # Prompt templates and trigger mappings
    ├── config.yml      # Guardrails engine settings and routing rules
    └── actions.py      # Custom callbacks for external services
    

3.3 Orchestration with n8n

  • Webhook Listener: Exposes HTTP POST endpoint to receive JSON-formatted user queries.
  • Policy Validation Node: Routes incoming content to the Guardrails engine; invalid or unsafe inputs are replaced with safe completions.
  • Inference Node: Forwards validated prompts to the Mistral 7B inference API and awaits generated output.
  • Response Dispatcher: Consolidates model outputs and returns them to clients in standardized JSON responses.

3.4 Open WebUI Front-End

  • UI Framework: Based on the Open WebUI library, providing a reactive chat interface.

  • Features:

    • Real-time streaming of text and multimedia.
    • Quick-reply button generation.
    • Resilient error handling for network or validation interruptions.

4. Deployment Workflow

4.6 FastAPI Integration

Integrate the model and guardrails engine behind a FastAPI service:

from pydantic import BaseModel
from nemoguardrails import RailsConfig, LLMRails
from fastapi import FastAPI

# FastAPI
app = FastAPI(title = "modelkai")

# Configuration of guardrails
config = RailsConfig.from_path("./config")
rails = LLMRails(config, verbose=True)

class ChatRequest(BaseModel):
    message: str

@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
    response = await rails.generate_async(
        messages=[{"role": "user", "content": request.message}]
    )
    return {"response": response["content"]}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=5000)