YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
LFM2.5-230M - Core ML (fp16)
A Core ML-optimized, 16-bit floating-point conversion of LiquidAI's LFM2.5-230M, designed for fast, on-device inference on Apple devices (iPhone, iPad, Mac). Retains full precision for maximum accuracy while enabling seamless integration with iOS/macOS apps via Core ML.
🔍 Model Overview
| Feature | Details |
|---|---|
| Base Model | LiquidAI/LFM2.5-230M-Base |
| Fine-tuned Model | LiquidAI/LFM2.5-230M |
| Precision | fp16 (16-bit floating point) |
| Framework | Core ML (Apple's machine learning framework) |
| Architecture | Lfm2ForCausalLM (Hybrid: 8× Double-Gated LIV Convolution + 6× Grouped-Query Attention) |
| Parameters | 230M |
| Context Length | 128,000 tokens |
| Vocabulary | 65,536 tokens |
| License | Other (Check LiquidAI for details) |
| Download Size | ~1.06 GB (.mlpackage + .safetensors) |
🎯 Capabilities
✅ On-Device Inference – Optimized for Apple Silicon (iPhone, iPad, Mac) via Core ML. ✅ Blazing Fast – 213 tok/s on Galaxy S25 Ultra, 42 tok/s on Raspberry Pi 5 (LiquidAI benchmarks). ✅ Efficient Architecture – Hybrid convolution + attention layers for speed and accuracy. ✅ Long Context – 128K token context window (32K extension phase included in training). ✅ General-Purpose – Text-only model trained on 19T tokens with distillation, DPO, and RL. ✅ Agentic-Ready – Designed for agent workflows, automation, and edge deployment.
📊 Performance Highlights
| Metric | Value | Notes |
|---|---|---|
| Parameters | 230M | Compact yet powerful. |
| Layers | 14 | 8× Convolution (LIV) + 6× Grouped-Query Attention (GQA). |
| Hidden Size | 1,024 | |
| Intermediate Size | 2,560 | |
| Attention Heads | 16 (8 KV heads) | Grouped-Query Attention for efficiency. |
| Pre-training Tokens | 19T | Including 32K context extension phase. |
| Post-Training | SFT + DPO + Multi-Domain RL | Distilled from LFM2.5-350M for competitive performance. |
| Speed (Edge) | 213 tok/s (S25 Ultra) | 42 tok/s (Raspberry Pi 5). |
💡 Why LFM2.5?
- Faster than SSM hybrids & Gated Delta Networks of similar size.
- Runs everywhere: Cloud GPUs → CPUs → Mobile devices.
- Day-one ecosystem support: llama.cpp, MLX, vLLM, SGLang, ONNX, Core ML.
🚀 Quick Start
🍎 1. Use in iOS/macOS Apps (Core ML)
Prerequisites
- Xcode 15+
- macOS 14+ / iOS 17+
- Core ML framework
Step 1: Download the Model
git lfs install
git clone https://huggingface.co/code-and-canvas/lfm2.5-230m-coreML-fp16
cd lfm2.5-230m-coreML-fp16
Step 2: Integrate into Xcode
- Drag
model.mlpackageinto your Xcode project. - Ensure it's added to your target's "Build Phases" → "Copy Bundle Resources".
- Import Core ML in your Swift code:
import CoreML
// Load the model
guard let modelURL = Bundle.main.url(forResource: "model", withExtension: "mlpackage") else {
fatalError("Model file not found")
}
do {
let model = try MLModel(contentsOf: modelURL)
// Use the model for inference
let input = try MLDictionaryFeatureProvider(dictionary: ["input": "Your prompt here"])
let prediction = try model.prediction(from: input)
// Handle output
} catch {
print("Error loading model: \(error)")
}
Step 3: Use with coremltools (Python)
import coremltools as ct
# Load the model
model = ct.models.MLModel("model.mlpackage")
# Run inference
input_data = {"input": "What is the capital of France?"}
prediction = model.predict(input_data)
print(prediction)
🐍 2. Use with Transformers (Hugging Face Format)
The repository includes both Core ML and Hugging Face formats (hf_model/).
Install Dependencies
pip install transformers torch
Load and Run
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "code-and-canvas/lfm2.5-230m-coreML-fp16"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto"
)
# Generate text
inputs = tokenizer("Write a haiku about coding:", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Streaming Generation
from transformers import TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
outputs = model.generate(**inputs, streamer=streamer, max_new_tokens=100)
3. Command Line (llama.cpp)
If you convert to GGUF:
# Example (requires GGUF conversion first)
llama-cli -m lfm2.5-230m-fp16.gguf -p "Explain quantum computing simply."
🛠️ Configuration
Model Architecture
| Setting | Value |
|---|---|
| Architecture | Lfm2ForCausalLM |
| Layers | 14 (8× Conv + 6× GQA) |
| Hidden Size | 1,024 |
| Intermediate Size | 2,560 |
| Attention Heads | 16 |
| Key-Value Heads | 8 |
| Vocabulary Size | 65,536 |
| Max Position Embeddings | 128,000 |
| ROPE Theta | 1,000,000 |
| Tie Word Embeddings | True |
| Use Cache | True |
Generation Config (Default)
{
"bos_token_id": 1,
"eos_token_id": 7,
"pad_token_id": 0,
"do_sample": true,
"temperature": 0.1,
"top_k": 50,
"repetition_penalty": 1.05,
"use_cache": true
}
Special Tokens
| Token Type | ID |
|---|---|
| BOS | 1 |
| EOS | 7 |
| PAD | 0 |
📁 Files
| File | Description |
|---|---|
model.mlpackage/ |
Core ML model (for iOS/macOS integration). |
hf_model/config.json |
Hugging Face model configuration. |
hf_model/generation_config.json |
Default generation parameters. |
hf_model/model.safetensors |
Model weights in safetensors format (fp16). |
hf_model/tokenizer.json |
Tokenizer configuration. |
hf_model/tokenizer_config.json |
Tokenizer metadata. |
model_config.json |
Core ML model configuration. |
🔧 Use Cases
✅ Recommended
- On-Device AI Apps – Deploy on iPhone, iPad, or Mac with Core ML.
- Edge Deployment – Run on Raspberry Pi, Jetson, or low-power devices.
- Agentic Workflows – Lightweight model for automation, chatbots, or tools.
- Local Inference – Fast, private inference without cloud dependency.
- Prototyping – Quickly test ideas offline or on-device.
⚠️ Considerations
| Aspect | LFM2.5-230M (fp16) |
|---|---|
| Size | ~1.06 GB |
| Speed | Very Fast (213 tok/s on high-end mobile) |
| Accuracy | Competitive with larger models (thanks to distillation) |
| Context | 128K tokens (long conversations, documents) |
| Multimodal | ❌ Text-only |
| Fine-Tunable | ✅ Yes (Hugging Face format included) |
💡 Why choose this over larger models?
- Speed: Optimized for real-time on-device inference.
- Size: Fits on mobile devices with limited storage.
- Efficiency: Lower power consumption and memory usage.
🔄 Training & Architecture Details
Base Model
- Original:
LiquidAI/LFM2.5-230M-Base - Pre-training: 19T tokens, including 32K context extension phase.
Post-Training (LFM2.5-230M)
- Supervised Fine-Tuning (SFT) – Distilled from LFM2.5-350M.
- Direct Preference Optimization (DPO) – Alignment for quality.
- Multi-Domain Reinforcement Learning (RL) – Flexibility for downstream tasks.
Architecture (LFM2)
- Hybrid Design: Combines convolution (LIV blocks) and attention (GQA).
- Double-Gated LIV Convolution: 8 layers for efficient sequence processing.
- Grouped-Query Attention (GQA): 6 layers for scalable attention.
- Efficiency: Faster than SSM hybrids and Gated Delta Networks of similar size.
Core ML Conversion
- Precision: fp16 (16-bit floating point) for balance of speed and accuracy.
- Compatibility: Works on all Apple devices supporting Core ML.
- Format:
.mlpackage(modern Core ML bundle format).
📜 License
Other – Refer to LiquidAI's terms for usage rights. (Original LFM2.5 models may have specific licensing; verify before commercial use.)
🙏 Acknowledgments
- Base Model: LiquidAI/LFM2.5-230M (Liquid AI).
- Architecture: LFM2 (Hybrid convolution + attention).
- Core ML: Apple Core ML framework.
- Conversion: Powered by
coremltoolsand Hugging Face ecosystem.
💬 Example Prompts
General Q&A
What are the key differences between Python and JavaScript?
Coding
Write a Python function to reverse a linked list in-place.
Creative Writing
Write a short story about a robot discovering emotions.
Agentic Tasks
Act as a personal assistant. My calendar is empty tomorrow. Suggest 3 productive things I could do.
Long Context
Here's a 500-line Python script. Can you:
1. Summarize what it does.
2. Identify potential bugs.
3. Suggest improvements.
[Insert long script here...]
🐛 Issues & Support
- Bugs: Open an issue on the repository.
- Questions: Ask in Discussions.
- Contributions: PRs welcome!
📚 Additional Resources
🚀 Happy Building! Built with ❤️ by Code and Canvas.
- Downloads last month
- 4