YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LFM2.5-230M - Core ML (fp16)

License: Other Model Size Quantization Framework Context Length

A Core ML-optimized, 16-bit floating-point conversion of LiquidAI's LFM2.5-230M, designed for fast, on-device inference on Apple devices (iPhone, iPad, Mac). Retains full precision for maximum accuracy while enabling seamless integration with iOS/macOS apps via Core ML.


🔍 Model Overview

Feature Details
Base Model LiquidAI/LFM2.5-230M-Base
Fine-tuned Model LiquidAI/LFM2.5-230M
Precision fp16 (16-bit floating point)
Framework Core ML (Apple's machine learning framework)
Architecture Lfm2ForCausalLM (Hybrid: 8× Double-Gated LIV Convolution + 6× Grouped-Query Attention)
Parameters 230M
Context Length 128,000 tokens
Vocabulary 65,536 tokens
License Other (Check LiquidAI for details)
Download Size ~1.06 GB (.mlpackage + .safetensors)

🎯 Capabilities

On-Device Inference – Optimized for Apple Silicon (iPhone, iPad, Mac) via Core ML. ✅ Blazing Fast213 tok/s on Galaxy S25 Ultra, 42 tok/s on Raspberry Pi 5 (LiquidAI benchmarks). ✅ Efficient Architecture – Hybrid convolution + attention layers for speed and accuracy. ✅ Long Context128K token context window (32K extension phase included in training). ✅ General-Purpose – Text-only model trained on 19T tokens with distillation, DPO, and RL. ✅ Agentic-Ready – Designed for agent workflows, automation, and edge deployment.


📊 Performance Highlights

Metric Value Notes
Parameters 230M Compact yet powerful.
Layers 14 8× Convolution (LIV) + 6× Grouped-Query Attention (GQA).
Hidden Size 1,024
Intermediate Size 2,560
Attention Heads 16 (8 KV heads) Grouped-Query Attention for efficiency.
Pre-training Tokens 19T Including 32K context extension phase.
Post-Training SFT + DPO + Multi-Domain RL Distilled from LFM2.5-350M for competitive performance.
Speed (Edge) 213 tok/s (S25 Ultra) 42 tok/s (Raspberry Pi 5).

💡 Why LFM2.5?

  • Faster than SSM hybrids & Gated Delta Networks of similar size.
  • Runs everywhere: Cloud GPUs → CPUs → Mobile devices.
  • Day-one ecosystem support: llama.cpp, MLX, vLLM, SGLang, ONNX, Core ML.

🚀 Quick Start


🍎 1. Use in iOS/macOS Apps (Core ML)

Prerequisites

  • Xcode 15+
  • macOS 14+ / iOS 17+
  • Core ML framework

Step 1: Download the Model

git lfs install
git clone https://huggingface.co/code-and-canvas/lfm2.5-230m-coreML-fp16
cd lfm2.5-230m-coreML-fp16

Step 2: Integrate into Xcode

  1. Drag model.mlpackage into your Xcode project.
  2. Ensure it's added to your target's "Build Phases" → "Copy Bundle Resources".
  3. Import Core ML in your Swift code:
import CoreML

// Load the model
guard let modelURL = Bundle.main.url(forResource: "model", withExtension: "mlpackage") else {
    fatalError("Model file not found")
}

do {
    let model = try MLModel(contentsOf: modelURL)
    // Use the model for inference
    let input = try MLDictionaryFeatureProvider(dictionary: ["input": "Your prompt here"])
    let prediction = try model.prediction(from: input)
    // Handle output
} catch {
    print("Error loading model: \(error)")
}

Step 3: Use with coremltools (Python)

import coremltools as ct

# Load the model
model = ct.models.MLModel("model.mlpackage")

# Run inference
input_data = {"input": "What is the capital of France?"}
prediction = model.predict(input_data)
print(prediction)

🐍 2. Use with Transformers (Hugging Face Format)

The repository includes both Core ML and Hugging Face formats (hf_model/).

Install Dependencies

pip install transformers torch

Load and Run

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "code-and-canvas/lfm2.5-230m-coreML-fp16"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto"
)

# Generate text
inputs = tokenizer("Write a haiku about coding:", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Streaming Generation

from transformers import TextStreamer

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=100)

3. Command Line (llama.cpp)

If you convert to GGUF:

# Example (requires GGUF conversion first)
llama-cli -m lfm2.5-230m-fp16.gguf -p "Explain quantum computing simply."

🛠️ Configuration

Model Architecture

Setting Value
Architecture Lfm2ForCausalLM
Layers 14 (8× Conv + 6× GQA)
Hidden Size 1,024
Intermediate Size 2,560
Attention Heads 16
Key-Value Heads 8
Vocabulary Size 65,536
Max Position Embeddings 128,000
ROPE Theta 1,000,000
Tie Word Embeddings True
Use Cache True

Generation Config (Default)

{
  "bos_token_id": 1,
  "eos_token_id": 7,
  "pad_token_id": 0,
  "do_sample": true,
  "temperature": 0.1,
  "top_k": 50,
  "repetition_penalty": 1.05,
  "use_cache": true
}

Special Tokens

Token Type ID
BOS 1
EOS 7
PAD 0

📁 Files

File Description
model.mlpackage/ Core ML model (for iOS/macOS integration).
hf_model/config.json Hugging Face model configuration.
hf_model/generation_config.json Default generation parameters.
hf_model/model.safetensors Model weights in safetensors format (fp16).
hf_model/tokenizer.json Tokenizer configuration.
hf_model/tokenizer_config.json Tokenizer metadata.
model_config.json Core ML model configuration.

🔧 Use Cases

✅ Recommended

  • On-Device AI Apps – Deploy on iPhone, iPad, or Mac with Core ML.
  • Edge Deployment – Run on Raspberry Pi, Jetson, or low-power devices.
  • Agentic Workflows – Lightweight model for automation, chatbots, or tools.
  • Local Inference – Fast, private inference without cloud dependency.
  • Prototyping – Quickly test ideas offline or on-device.

⚠️ Considerations

Aspect LFM2.5-230M (fp16)
Size ~1.06 GB
Speed Very Fast (213 tok/s on high-end mobile)
Accuracy Competitive with larger models (thanks to distillation)
Context 128K tokens (long conversations, documents)
Multimodal ❌ Text-only
Fine-Tunable ✅ Yes (Hugging Face format included)

💡 Why choose this over larger models?

  • Speed: Optimized for real-time on-device inference.
  • Size: Fits on mobile devices with limited storage.
  • Efficiency: Lower power consumption and memory usage.

🔄 Training & Architecture Details

Base Model

Post-Training (LFM2.5-230M)

  1. Supervised Fine-Tuning (SFT) – Distilled from LFM2.5-350M.
  2. Direct Preference Optimization (DPO) – Alignment for quality.
  3. Multi-Domain Reinforcement Learning (RL) – Flexibility for downstream tasks.

Architecture (LFM2)

  • Hybrid Design: Combines convolution (LIV blocks) and attention (GQA).
  • Double-Gated LIV Convolution: 8 layers for efficient sequence processing.
  • Grouped-Query Attention (GQA): 6 layers for scalable attention.
  • Efficiency: Faster than SSM hybrids and Gated Delta Networks of similar size.

Core ML Conversion

  • Precision: fp16 (16-bit floating point) for balance of speed and accuracy.
  • Compatibility: Works on all Apple devices supporting Core ML.
  • Format: .mlpackage (modern Core ML bundle format).


📜 License

Other – Refer to LiquidAI's terms for usage rights. (Original LFM2.5 models may have specific licensing; verify before commercial use.)



🙏 Acknowledgments

  • Base Model: LiquidAI/LFM2.5-230M (Liquid AI).
  • Architecture: LFM2 (Hybrid convolution + attention).
  • Core ML: Apple Core ML framework.
  • Conversion: Powered by coremltools and Hugging Face ecosystem.


💬 Example Prompts

General Q&A

What are the key differences between Python and JavaScript?

Coding

Write a Python function to reverse a linked list in-place.

Creative Writing

Write a short story about a robot discovering emotions.

Agentic Tasks

Act as a personal assistant. My calendar is empty tomorrow. Suggest 3 productive things I could do.

Long Context

Here's a 500-line Python script. Can you:
1. Summarize what it does.
2. Identify potential bugs.
3. Suggest improvements.
[Insert long script here...]


🐛 Issues & Support



📚 Additional Resources


🚀 Happy Building! Built with ❤️ by Code and Canvas.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Free AI Image Generator No sign-up. Instant results. Open Now