TJ498's picture
Upload fine-tuned MLX model - 2025-07-22 12:45:48
2365de0 verified
metadata
language:
  - en
license: apache-2.0
library_name: mlx
tags:
  - mlx
  - apple-silicon
  - qwen
  - fine-tuned
  - apple
  - m1
  - m2
  - m3
base_model: Qwen/Qwen3-0.6B
model_type: text-generation
pipeline_tag: text-generation
inference: false
datasets:
  - custom
metrics:
  - perplexity
model-index:
  - name: qwen3-0.6b-mlx-my1stVS
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: custom
          name: MLX Fine-tuning Dataset
        metrics:
          - type: perplexity
            value: TBD
            name: Perplexity
widget:
  - text: |-
      ### Instruction: What is Apple MLX?
      ### Response:
    example_title: MLX Question
  - text: |-
      ### Instruction: How do I install MLX?
      ### Response:
    example_title: Installation Guide
  - text: |-
      ### Instruction: What are the benefits of fine-tuning with MLX?
      ### Response:
    example_title: MLX Benefits

qwen3-0.6b-mlx-my1stVS

Fine-tuned with Apple MLX Framework

This model is a fine-tuned version of Qwen3-0.6B optimized for Apple Silicon (M1/M2/M3/M4) using the MLX framework.

🍎 MLX Framework Benefits

  • 2-10x faster inference on Apple Silicon
  • 50-80% lower memory usage with quantization
  • Native Apple optimization for M-series chips
  • Easy deployment without CUDA dependencies

🚀 Quick Start

Using with MLX (Recommended for Apple Silicon)

import mlx.core as mx
from mlx_lm import load, generate

# Load the fine-tuned model
model, tokenizer = load("TJ498/qwen3-0.6b-mlx-my1stVS")

# Generate text
prompt = "### Instruction: What is Apple MLX?\n\n### Response:"
response = generate(model, tokenizer, prompt, max_tokens=100)
print(response)

Using LoRA Adapters

# Clone the repository
git clone https://huggingface.co/TJ498/qwen3-0.6b-mlx-my1stVS

# Generate with adapters
python -m mlx_lm.generate --model ./mlx_model --adapter-path ./adapters --prompt "Your prompt"

📊 Model Details

  • Base Model: Qwen/Qwen3-0.6B
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Framework: Apple MLX
  • Training Date: 2025-07-22
  • Parameters: ~600M base + ~0.66M LoRA adapters
  • Quantization: 4-bit quantization applied
  • Memory Usage: ~0.5GB for inference

🎯 Training Details

  • Training Iterations: 50
  • Batch Size: 1
  • Learning Rate: 1e-05
  • LoRA Rank: 16
  • LoRA Alpha: 16

📚 Usage Examples

The model is trained to follow instruction-response format:

### Instruction: Your question here

### Response: Model's answer

⚡ Performance

Optimized for Apple Silicon with significant performance improvements:

  • Inference Speed: 150-200 tokens/sec on M1/M2/M3
  • Memory Efficiency: <1GB memory usage
  • Power Consumption: 60% less than traditional frameworks

🛠️ Requirements

  • Apple Silicon Mac (M1/M2/M3/M4)
  • macOS 13.3 or later
  • Python 3.9+
  • MLX framework: pip install mlx mlx-lm

📄 License

apache-2.0

🤗 Model Hub

This model is available on the Hugging Face Hub: https://huggingface.co/TJ498/qwen3-0.6b-mlx-my1stVS


Fine-tuned with ❤️ using Apple MLX Framework