Model Card for Meow-Omni 1
Meow-Omni 1 is the world’s first Multimodal Large Language Model (MLLM) specifically engineered for Computational Ethology. It natively co-embeds four distinct modalities—Text, Video, Audio, and Biological Time-Series—to decode the latent intentions of non-verbal species.
🐾 Model Summary
Meow-Omni 1 is the fine-tuned, intent-aligned version of the Meow-Omni 1-Base architecture. By training on the Meow-10K dataset using a novel Next-Behaviour Prediction (NBP) logic, this model moves beyond simple action recognition to resolve "semantic aliasing"—distinguishing, for example, between contentment-purring and distress-purring by correlating vocalizations with internal physiological markers (ECG/EEG).
- Fine-tuned from: Meow-Omni 1-Base
- Primary Task: Feline Intention Decoding and Behavioural Interpretation
🚀 Key Features
- Quad-Modal Reasoning: Simultaneously processes visual cues, acoustic signals, and high-frequency biometrics within a single transformer context.
- Explainable Ethology: Unlike black-box classifiers, Meow-Omni 1 can articulate the causal relationship between a physiological spike and a behavioural display in natural language.
- Uncertainty Quantification: Built-in predictive entropy allows the model to "flag" ambiguous or contradictory signals (e.g., when biometrics contradict visual cues), ensuring clinical safety.
- Lightweight Deployment: Engineered with minimal dependencies to ensure reproducibility and accessibility for researchers in wildlife conservation.
📈 Performance: MeowBench
Meow-Omni 1 was evaluated on the MeowBench MCQ suite (586 expert-verified samples) and achieved state-of-the-art results. Detailed leaderboard coming soon.
🛠️ How to Use
Meow-Omni 1 accepts four inputs:
- Video: Behavioral context.
- Audio: Vocalization patterns.
- Time-Series: IMU data (via custom control tokens).
- Text: Instructions or questions regarding the animal's state.
import torch
import soundfile as sf
import numpy as np
from PIL import Image
from decord import VideoReader, cpu
from modeling_meow_omni_1 import MeowOmni1ForCausalLM
from processing_meow_omni_1 import MeowOmni1Processor
# 1. Setup Model and Processor
model_path = "smgjch/Meow-Omni-1"
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = MeowOmni1Processor.from_pretrained(model_path, trust_remote_code=True)
model = MeowOmni1ForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.bfloat16
).to(device).eval()
# 2. Prepare Modality Inputs
video_path = "sample_cat_video.mp4"
audio_path = "sample_cat_purr.wav"
ts_path = "sample_biometrics.json"
# Process Video (16 frames)
vr = VideoReader(video_path, ctx=cpu(0))
indices = np.linspace(0, len(vr) - 1, 16, dtype=int)
frames = [Image.fromarray(f).convert("RGB") for f in vr.get_batch(indices).asnumpy()]
# Process Audio
audio_arr, _ = sf.read(audio_path)
audios = [audio_arr[:480000].astype(np.float32)]
# 3. Construct Prompt with Modal Placeholders
# Note: Placeholders MUST match the number of input items (e.g., 16 image tags for 16 frames)
placeholders = (
"".join(["<image>./</image>"] * len(frames)) + # Video frames
"<audio>./</audio>" + # Audio stream
"<|ts_start|><|ts_unit|><|ts_end|>" # Time-series block
)
raw_query = "Analyze the provided multi-modal data. What is this cat's intention?"
prompt = f"User: {placeholders}\n{raw_query}\nAssistant:"
# 4. Run Inference
inputs = processor(
text=[prompt],
images=frames,
audios=audios,
time_series_paths=[ts_path],
time_series_sampling_rates=[100.0],
return_tensors="pt"
).to(device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=128,
do_sample=True,
temperature=0.7,
top_p=0.95
)
response = processor.tokenizer.decode(output[0], skip_special_tokens=True)
print(f"\n🔍 Meow-Omni 1 Analysis:\n{response}")
🔗 The Meow-Omni Ecosystem
- Base Model: Meow-Omni 1-Base — The raw architectural foundation.
- Training Dataset: Meow-10K — The synchronized 10k sample training corpus.
- Evaluation Benchmark: MeowBench — The expert-verified quad-modal benchmark suite.
📝 Citation
Coming Soon.
- Downloads last month
- 163