YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LFM2.5-230M - Core ML (fp16)

A Core ML-optimized, 16-bit floating-point conversion of LiquidAI's LFM2.5-230M, designed for fast, on-device inference on Apple devices (iPhone, iPad, Mac). Retains full precision for maximum accuracy while enabling seamless integration with iOS/macOS apps via Core ML.

🔍 Model Overview

Feature	Details
Base Model	`LiquidAI/LFM2.5-230M-Base`
Fine-tuned Model	`LiquidAI/LFM2.5-230M`
Precision	fp16 (16-bit floating point)
Framework	Core ML (Apple's machine learning framework)
Architecture	`Lfm2ForCausalLM` (Hybrid: 8× Double-Gated LIV Convolution + 6× Grouped-Query Attention)
Parameters	230M
Context Length	128,000 tokens
Vocabulary	65,536 tokens
License	Other (Check LiquidAI for details)
Download Size	~1.06 GB (`.mlpackage` + `.safetensors`)

🎯 Capabilities

✅ On-Device Inference – Optimized for Apple Silicon (iPhone, iPad, Mac) via Core ML. ✅ Blazing Fast – 213 tok/s on Galaxy S25 Ultra, 42 tok/s on Raspberry Pi 5 (LiquidAI benchmarks). ✅ Efficient Architecture – Hybrid convolution + attention layers for speed and accuracy. ✅ Long Context – 128K token context window (32K extension phase included in training). ✅ General-Purpose – Text-only model trained on 19T tokens with distillation, DPO, and RL. ✅ Agentic-Ready – Designed for agent workflows, automation, and edge deployment.

📊 Performance Highlights

Metric	Value	Notes
Parameters	230M	Compact yet powerful.
Layers	14	8× Convolution (LIV) + 6× Grouped-Query Attention (GQA).
Hidden Size	1,024
Intermediate Size	2,560
Attention Heads	16 (8 KV heads)	Grouped-Query Attention for efficiency.
Pre-training Tokens	19T	Including 32K context extension phase.
Post-Training	SFT + DPO + Multi-Domain RL	Distilled from LFM2.5-350M for competitive performance.
Speed (Edge)	213 tok/s (S25 Ultra)	42 tok/s (Raspberry Pi 5).

💡 Why LFM2.5?

Faster than SSM hybrids & Gated Delta Networks of similar size.

Runs everywhere: Cloud GPUs → CPUs → Mobile devices.

Day-one ecosystem support: llama.cpp, MLX, vLLM, SGLang, ONNX, Core ML.

🚀 Quick Start

🍎 1. Use in iOS/macOS Apps (Core ML)

Prerequisites

Xcode 15+
macOS 14+ / iOS 17+
Core ML framework

Step 1: Download the Model

git lfs install
git clone https://huggingface.co/code-and-canvas/lfm2.5-230m-coreML-fp16
cd lfm2.5-230m-coreML-fp16

Step 2: Integrate into Xcode

Drag model.mlpackage into your Xcode project.
Ensure it's added to your target's "Build Phases" → "Copy Bundle Resources".
Import Core ML in your Swift code:

import CoreML

// Load the model
guard let modelURL = Bundle.main.url(forResource: "model", withExtension: "mlpackage") else {
    fatalError("Model file not found")
}

do {
    let model = try MLModel(contentsOf: modelURL)
    // Use the model for inference
    let input = try MLDictionaryFeatureProvider(dictionary: ["input": "Your prompt here"])
    let prediction = try model.prediction(from: input)
    // Handle output
} catch {
    print("Error loading model: \(error)")
}

Step 3: Use with `coremltools` (Python)

import coremltools as ct

# Load the model
model = ct.models.MLModel("model.mlpackage")

# Run inference
input_data = {"input": "What is the capital of France?"}
prediction = model.predict(input_data)
print(prediction)

🐍 2. Use with Transformers (Hugging Face Format)

The repository includes both Core ML and Hugging Face formats (hf_model/).

Install Dependencies

pip install transformers torch

Load and Run

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "code-and-canvas/lfm2.5-230m-coreML-fp16"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto"
)

# Generate text
inputs = tokenizer("Write a haiku about coding:", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Streaming Generation

from transformers import TextStreamer

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=100)

3. Command Line (llama.cpp)

If you convert to GGUF:

# Example (requires GGUF conversion first)
llama-cli -m lfm2.5-230m-fp16.gguf -p "Explain quantum computing simply."

🛠️ Configuration

Model Architecture

Setting	Value
Architecture	`Lfm2ForCausalLM`
Layers	14 (8× Conv + 6× GQA)
Hidden Size	1,024
Intermediate Size	2,560
Attention Heads	16
Key-Value Heads	8
Vocabulary Size	65,536
Max Position Embeddings	128,000
ROPE Theta	1,000,000
Tie Word Embeddings	True
Use Cache	True

Generation Config (Default)

{
  "bos_token_id": 1,
  "eos_token_id": 7,
  "pad_token_id": 0,
  "do_sample": true,
  "temperature": 0.1,
  "top_k": 50,
  "repetition_penalty": 1.05,
  "use_cache": true
}

Special Tokens

Token Type	ID
BOS	1
EOS	7
PAD	0

📁 Files

File	Description
`model.mlpackage/`	Core ML model (for iOS/macOS integration).
`hf_model/config.json`	Hugging Face model configuration.
`hf_model/generation_config.json`	Default generation parameters.
`hf_model/model.safetensors`	Model weights in safetensors format (fp16).
`hf_model/tokenizer.json`	Tokenizer configuration.
`hf_model/tokenizer_config.json`	Tokenizer metadata.
`model_config.json`	Core ML model configuration.

🔧 Use Cases

✅ Recommended

On-Device AI Apps – Deploy on iPhone, iPad, or Mac with Core ML.
Edge Deployment – Run on Raspberry Pi, Jetson, or low-power devices.
Agentic Workflows – Lightweight model for automation, chatbots, or tools.
Local Inference – Fast, private inference without cloud dependency.
Prototyping – Quickly test ideas offline or on-device.

⚠️ Considerations

Aspect	LFM2.5-230M (fp16)
Size	~1.06 GB
Speed	Very Fast (213 tok/s on high-end mobile)
Accuracy	Competitive with larger models (thanks to distillation)
Context	128K tokens (long conversations, documents)
Multimodal	❌ Text-only
Fine-Tunable	✅ Yes (Hugging Face format included)

💡 Why choose this over larger models?

Speed: Optimized for real-time on-device inference.

Size: Fits on mobile devices with limited storage.

Efficiency: Lower power consumption and memory usage.

🔄 Training & Architecture Details

Base Model

Original: LiquidAI/LFM2.5-230M-Base
Pre-training: 19T tokens, including 32K context extension phase.

Post-Training (LFM2.5-230M)

Supervised Fine-Tuning (SFT) – Distilled from LFM2.5-350M.
Direct Preference Optimization (DPO) – Alignment for quality.
Multi-Domain Reinforcement Learning (RL) – Flexibility for downstream tasks.

Architecture (LFM2)

Hybrid Design: Combines convolution (LIV blocks) and attention (GQA).
Double-Gated LIV Convolution: 8 layers for efficient sequence processing.
Grouped-Query Attention (GQA): 6 layers for scalable attention.
Efficiency: Faster than SSM hybrids and Gated Delta Networks of similar size.

Core ML Conversion

Precision: fp16 (16-bit floating point) for balance of speed and accuracy.
Compatibility: Works on all Apple devices supporting Core ML.
Format: .mlpackage (modern Core ML bundle format).

📜 License

Other – Refer to LiquidAI's terms for usage rights. (Original LFM2.5 models may have specific licensing; verify before commercial use.)

🙏 Acknowledgments

Base Model: LiquidAI/LFM2.5-230M (Liquid AI).
Architecture: LFM2 (Hybrid convolution + attention).
Core ML: Apple Core ML framework.
Conversion: Powered by coremltools and Hugging Face ecosystem.

💬 Example Prompts

General Q&A

What are the key differences between Python and JavaScript?

Coding

Write a Python function to reverse a linked list in-place.

Creative Writing

Write a short story about a robot discovering emotions.

Agentic Tasks

Act as a personal assistant. My calendar is empty tomorrow. Suggest 3 productive things I could do.

Long Context

Here's a 500-line Python script. Can you:
1. Summarize what it does.
2. Identify potential bugs.
3. Suggest improvements.
[Insert long script here...]

🐛 Issues & Support

Bugs: Open an issue on the repository.
Questions: Ask in Discussions.
Contributions: PRs welcome!

📚 Additional Resources

🚀 Happy Building! Built with ❤️ by Code and Canvas.

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support