pumlGenV2 / README.md
chrisrutherford's picture
Update README.md
ebd9fda verified
|
raw
history blame
7.03 kB
metadata
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen3-8B-Base
tags:
  - llama-factory
  - full
  - generated_from_trainer
  - text2diagram
  - plantuml
  - code-generation
model-index:
  - name: pumlGenV2-1
    results: []

pumlGenV1-1

This model is a fine-tuned version of Qwen/Qwen3-8B-Base on a pumlGen dataset. It specializes in generating PlantUML diagrams from natural language questions.

Model description

pumlGenV2-1 is a specialized language model that converts complex questions into structured PlantUML diagrams. The model takes philosophical, historical, legal, or analytical questions as input and generates comprehensive PlantUML code that visualizes the relationships, hierarchies, and connections between concepts mentioned in the question.

Key features:

  • Generates syntactically correct PlantUML diagrams
  • Creates structured visualizations with packages, entities, and relationships
  • Adds contextual notes and annotations
  • Handles complex domain-specific topics across various fields

Intended uses & limitations

Intended uses

  • Educational purposes: Creating visual diagrams to explain complex concepts
  • Research visualization: Mapping relationships between ideas, theories, or historical events
  • Documentation: Generating diagrams for technical or conceptual documentation
  • Analysis tools: Visualizing interconnections in philosophical, legal, or social topics

Limitations

  • The model is specifically trained for PlantUML output format
  • Best performance on analytical, philosophical, historical, and conceptual questions
  • May require post-processing for specific PlantUML styling preferences
  • Generated diagrams should be reviewed for accuracy and completeness

Training and evaluation data

The model was trained on the pumlGen dataset, which consists of question-answer pairs where:

  • Input: Complex analytical questions about various topics (philosophy, history, law, social sciences)
  • Output: Corresponding PlantUML diagram code that visualizes the concepts and relationships

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • num_epochs: 3.0

Training results

The model demonstrates strong capabilities in:

  • Generating valid PlantUML syntax
  • Creating meaningful entity relationships
  • Adding appropriate annotations and notes
  • Structuring complex information hierarchically

Framework versions

  • Transformers 4.52.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.1

Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("your-username/pumlGenV1-1")
tokenizer = AutoTokenizer.from_pretrained("your-username/pumlGenV1-1")

# Prepare the input in conversation format
question = "What role does the annual flooding of the Nile play in the overall agricultural success and survival of the kingdoms along its banks?"

messages = [
    {"from": "human", "value": question},
]

# Format the input (adjust based on your specific tokenizer's chat template)
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt")

# Generate PlantUML diagram
outputs = model.generate(
    **inputs, 
    max_length=2048,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

# Decode and extract the PlantUML code
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract the PlantUML code from the response (between @startuml and @enduml)
plantuml_code = response.split("@startuml")[-1].split("@enduml")[0]
plantuml_code = "@startuml" + plantuml_code + "@enduml"

print(plantuml_code)

Eval Q1

Can artificial intelligence ever achieve true understanding, or is it limited to sophisticated pattern recognition? Break this down by examining the nature of consciousness, the semantics of 'understanding,' the boundaries of computational logic, and the role of embodiment in cognition—then map these components into a coherent framework

image/png

Eval Q2

Question:
*"Generate a PlantUML diagram that visualizes a microservices-based e-commerce architecture with the following components and relationships:

  1. Frontend Services:

    • A React-based 'Storefront' app (handles product listings, cart, and checkout).
    • An Angular-based 'Admin Dashboard' (for inventory, orders, and analytics).
    • Both interact with an 'API Gateway' (Kong) that routes requests to backend services.
  2. Backend Services:

    • 'Product Service' (manages SKUs, pricing, and inventory; uses PostgreSQL).
    • 'Order Service' (processes transactions, integrates with Stripe/PayPal; uses MongoDB).
    • 'User Service' (handles authentication via JWT/OAuth2; Redis cache for sessions).
    • 'Recommendation Service' (ML-driven, trained via TensorFlow; pulls data from a Kafka stream).
    • 'Notification Service' (email/SMS alerts via AWS SNS).
  3. Supporting Infrastructure:

    • Docker containers orchestrated via Kubernetes (with labeled nodes for 'prod' and 'staging').
    • CI/CD pipeline (GitHub Actions → Docker Hub → ArgoCD for deployments).
    • Monitoring stack (Prometheus + Grafana, with custom dashboards per service).
    • External dependencies (Stripe API, Twilio API, and a legacy ERP system exposed via REST).
  4. Data Flow:

    • Async communication between services via RabbitMQ (e.g., order confirmations → notifications).
    • Event sourcing for 'Order Service' using Kafka (commands vs. events).
    • CQRS pattern separating read/write databases for 'Product Service.'
  5. Security & Observability:

    • TLS/mTLS between services.
    • Istio for service mesh (with circuit breakers and retries).
    • Distributed tracing (Jaeger) and structured logging (ELK stack).

Additional Requirements:

  • Color-code services by domain (e.g., yellow for payment, green for inventory).
  • Annotate critical interactions (e.g., 'HTTP POST /orders').
  • Include a legend explaining symbols (containers, queues, databases).
  • Optionally, overlay a sequence diagram snippet showing the 'checkout flow' (user → API Gateway → Order Service → Payment → Notification)."*

image/png