Oracle850B-MoE — Mixture of Experts Language Model
Project: Oracle — a line of proprietary reasoning LLMs by M∞1 Corporation
Model: Oracle850B-MoE (850B parameters, Mixture of Experts - 128 experts)
Author: MagistrTheOne|Krasnodar|2025
Repository: MagistrTheOne/oracle850b-moe
Oracle850B-MoE — M∞1's proprietary architecture with a total volume of ≈850B parameters (128 experts, top-k=2, active ≈180–220B). OWN MODEL / NO EXTERNAL CHECKPOINTS. Data/infrastructure/config preparation; training is launched on an external cluster.
🔒 Strict Rules
- NO LOCAL TRAIN.
ALLOW_LOCAL_TRAIN=false— any train run fails with a prompt. - NO EXTERNAL WEIGHTS. Links/downloads to GPT-2/LLaMA/Mistral/Qwen/Phi/Gemma/OPT etc. are prohibited. CI guard is mandatory.
- Preparation only: code, configs, mock artifacts, dry-run; mini-samples for pipeline verification.
- Identity: special tokens
<|oracle_sys|>,<|oracle_intro|>,<|author|>, auto-injection in serve.
🏗️ Architecture
MoE-850B Configuration
{
"model_name": "oracle850b-moe",
"arch": "decoder-only",
"param_total": 850000000000,
"moe": {
"experts": 128,
"expert_hidden": 2816,
"router": {"type": "topk", "k": 2, "load_balancing_loss": 0.01}
},
"dense": {"d_model": 8192, "n_layers": 96, "n_heads": 64, "d_ff": 24576},
"activation": "swiglu",
"rope_theta": 10000,
"rotary_pct": 0.5,
"rmsnorm_eps": 1e-5,
"flash_attn": true,
"kv_cache": true,
"vocab_size": 131072,
"max_seq_len": 16384,
"fp": {"train": "bf16", "infer": "auto"}
}
Explanation: total number of parameters ≈850B due to the expert pool; 2 experts are active per token → "active parameters" ~180–220B. This gives 200B-class quality with fewer FLOPs.
Special Tokens
<|oracle_sys|>— Oracle system token<|oracle_intro|>— Oracle introductory token<|author|>— author token (MagistrTheOne|Krasnodar|2025|850B)<|endoftext|>— end of text<|pad|>— padding<|unk|>— unknown token
📊 TB-Scale Data Pipeline
Pipeline Structure
obj://oracle-data/raw/... # Source data
↓ ingest.py
obj://oracle-data/clean/... # Cleaned data
↓ clean_generic.py
obj://oracle-data/decontaminated/... # Decontaminated data
↓ decontaminate.py
obj://oracle-data/webdataset/... # WebDataset shards
↓ shard_webdataset.py
obj://oracle-data/stats/... # Statistics and reports
↓ stats.py
Processing Scripts
ingest.py— intake from S3/HTTPS; JSON manifest (source, license, size, hashes)clean_generic.py— unicode normalization, dedup (MinHash/LSH), language (ru/en), PII, toxicitydecontaminate.py— evaluation stop-lists; intersection reportsshard_webdataset.py— packaging into tar-shards (e.g., 512MB),.idxindex, map-stylestats.py— summaries (duplicates, languages, topics, lengths)
🚀 Training: Parallelism and Checkpoints
Training Configuration
seq_len: 16384
micro_bsz: 1
global_bsz: 4096
grad_accum: 512
precision: bf16
parallelism:
tensor: 16 # TP
pipeline: 12 # PP (stages)
sequence: true # SP (ops sharding)
moe:
top_k: 2
capacity_factor: 1.25
zloss: 0.001
opt: adamw
lr: 8e-5
warmup_steps: 8000
max_steps: 800000
checkpoint:
every_steps: 1000
keep_last: 3
s3_mirror: true
logging: json
Launcher Requirements
- Support for TP/PP/SP mapping across nodes/GPU (16×TP, 12×PP)
- Elastic restart, automatic resume from the last fully loaded checkpoint
- Dry-run: verify layout without starting math
☁️ Cloud Orchestration
Terraform (Yandex Cloud)
- VPC, Object Storage, Container Registry
- Kubernetes cluster with GPU nodes
- Budget constraints and alerts
- Monitoring and logging
Helm Charts
- Charts for training and serving
- Resource configuration and tolerations
- Service accounts and RBAC
Kill Switch
- Emergency stop of all pipelines
- Terraform resource destruction
- Pre-flight checks
🛡️ CI/CD and Guards
CI Guards
guard_external_models.yml— fail on mentions ofgpt2|llama|mistral|qwen|phi|gemma|optpush_to_hub.yml— publish metadata to HF (Free/Pro via ENV)
Security Scripts
guard_no_local_train.py— blocks local trainingkill_switch.py— emergency resource shutdown
📦 Hugging Face Hub
Publishing Strategy
- Today: push metadata (configs, tokenizer, README, MODEL_CARD)
- Tomorrow (Pro): enable
HF_HUB_ENABLE_HF_TRANSFER=1, multi-upload; weights — only after external training
Environment Variables
HUGGINGFACE_TOKEN=hf_***
HF_REPO=<user>/oracle850b-moe
HF_TIER=free # switch to pro later
HF_HUB_ENABLE_HF_TRANSFER=0
🚀 Quick Start
1. Installation
# Clone the repository
git clone https://github.com/MagistrTheOne/oracle850b-moe.git
cd oracle850b-moe
# Create virtual environment
make venv
make install
# Set up environment variables
cp .env.example .env
# Edit .env with your values
2. Verification
# Run CI guards
make ci-guards
# Check project status
make status
# Run tests
make test
3. Data Preparation (dry-run)
# Run data pipeline preparation
make prep-tb
# Infrastructure planning
make infra-plan
4. Upload to HF Hub
# Upload metadata to Hugging Face Hub
make push-hf
📁 Project Structure
oracle850b-moe/
├─ src/oracle/core/
│ ├─ modeling/ # MoE architecture
│ ├─ tokenization/ # Custom tokenizer
│ └─ serve/ # FastAPI server
├─ configs/
│ ├─ model/ # Model configs
│ ├─ training/ # Training configs
│ ├─ deepspeed/ # DeepSpeed configs
│ └─ serve/ # Serving configs
├─ datasets/scripts/ # Data processing scripts
├─ training/ # Launcher and scheduler
├─ infra/
│ ├─ terraform/ # Yandex Cloud infrastructure
│ ├─ helm/ # Kubernetes charts
│ └─ scripts/ # Management scripts
├─ ci/ # CI/CD pipelines
├─ scripts/ # Utilities and uploads
└─ checkpoints/ # Checkpoints and prompts
🔧 Makefile Commands
make help # Show help
make prep-tb # Run data pipeline (dry-run)
make infra-plan # Infrastructure planning
make ci-guards # Run CI guards
make test # Run tests
make clean # Clean temporary files
make kill-all # Emergency shutdown
make push-hf # Upload to HF Hub
⚠️ Limitations
- Local training prohibited — only cluster training
- External models prohibited — only proprietary architecture
- Python 3.11.9 — fixed dependency versions
- Virtual environment — dependency isolation
📞 Support
- Author: MagistrTheOne|Krasnodar|2025|850B
- Repository: https://github.com/MagistrTheOne/oracle850b-moe
- HF Hub: https://huggingface.co/MagistrTheOne/oracle850b-moe
📄 License
[License to be determined]
Disclaimer: Oracle850B is an experimental model. Use at your own risk. The author is not responsible for any consequences of use.
- Downloads last month
- -
Evaluation results
- GSM8K pass@1 on GSM8K (clean eval)self-reported
- HumanEval pass@1 on HumanEval (clean eval)self-reported