|  | --- | 
					
						
						|  | license: apache-2.0 | 
					
						
						|  | pipeline_tag: zero-shot-classification | 
					
						
						|  | tags: | 
					
						
						|  | - chemistry | 
					
						
						|  | - biology | 
					
						
						|  | - art | 
					
						
						|  | --- | 
					
						
						|  | # Pentachora Adaptive Encoded (Multi-Channel) - NOTEBOOK 2 of 5 | 
					
						
						|  | **A geometry-regularized classifier with a 5-frequency encoder and pentachoron constellation heads.** | 
					
						
						|  | *Author:* **AbstractPhil** · *Quartermaster:* **Mirel** · GPT 4o - GPT 5 - GPT 5 Fast - GPT 5 Thinking - GPT 5 Pro | 
					
						
						|  | *Assistants:* Claude Opus 4.1 - Claude Sonnet 4 - Gemini 2.5 | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | *License:* **Apache-2.0** | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 📌 TL;DR | 
					
						
						|  |  | 
					
						
						|  | This repository hosts training runs of a **frequency-aware encoder** (PentaFreq) paired with a **pentachoron constellation classifier** (dispatchers + specialists). The model blends classic cross-entropy with **two contrastive objectives** (dual InfoNCE and **ROSE-weighted** InfoNCE) and a **geometric regularizer** that keeps the learned vertex geometry sane. | 
					
						
						|  | It supports **1-channel and 3-channel** 28×28 inputs (e.g., TorchVision MNIST variants and MedMNIST 2D sets), is **seeded/deterministic**, and ships full artifacts (weights, plots, history, TensorBoard) for review. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Authors Notes | 
					
						
						|  | - Yes I am human, and this is an AI generated model card so it's probably going to be a little inaccurate. It just looks better than mine would look. | 
					
						
						|  | - This is design 2 of 5, the AI seems to always forget - so a reminder ahead of this because I probably won't edit it later. It has some odd stuff that doesn't matter, because this isn't the best one. | 
					
						
						|  | - Cataloging this model is important nonetheless, as it's a stepping stone to the more powerful geometric crystalization collective. | 
					
						
						|  | - I will include all cites to the adjacent papers used for the mathematics, model weights, inspirations, and test methodologies implemented at a later time. | 
					
						
						|  | - I appreciate every single contributor to this - direct or indirect - through your invaluable contributions to science that manifested in utilizable AI form. | 
					
						
						|  | - I have included the training notebook as train_notebook.ipynb - which shows the deterministic setup, the weights, the loss methods, and an absolute ton of random functions that I let the AIs monkey patch in because it's faster than trying to teach AI 15 classes in 15 files. | 
					
						
						|  |  | 
					
						
						|  | ## 🧠 Model overview | 
					
						
						|  |  | 
					
						
						|  | ### Architecture | 
					
						
						|  |  | 
					
						
						|  | - **PentaFreq Encoder (multi-channel)** | 
					
						
						|  | - 5 spectral branches (ultra-high, high, mid, low-mid, low) → per-branch encoders → cross-attention → MLP fusion → **normalized latent `z`**. | 
					
						
						|  | - Channel-aware: supports **C ∈ {1,3}**; input is flattened to `C×28×28`. | 
					
						
						|  |  | 
					
						
						|  | - **Pentachoron Constellation Classifier** | 
					
						
						|  | - **Two stacks** (dispatchers & specialists) each containing **pentachora** (5-vertex simplices) with learnable vertices. | 
					
						
						|  | - **Coherence gate** modulates vertex logits; **group heads** (one per vertex) score class subsets; **pair aggregation** + fusion MLP produce final logits. | 
					
						
						|  | - Geometry terms encourage valid simplex structure and separation between the two stacks. | 
					
						
						|  |  | 
					
						
						|  | ### Objective | 
					
						
						|  |  | 
					
						
						|  | - **CE** – main cross-entropy on logits. | 
					
						
						|  | - **Dual InfoNCE (stable)** – encourages `z` to match the **correct vertex** across both stacks. | 
					
						
						|  | - **ROSE-weighted InfoNCE (stable)** – same idea, but reweights samples by an analytic **ROSE** similarity (triadic cosine + magnitude). | 
					
						
						|  | - **Geometry Regularization** – stable Cayley–Menger **proxy** (eigval-based), edge-variance, center separation, and a **soft radius control**; ramped in early epochs. | 
					
						
						|  |  | 
					
						
						|  | > All contrastive losses use `log_softmax` + `gather` to avoid `inf−inf` traps; all paths **nan-sanitize** defensively. | 
					
						
						|  |  | 
					
						
						|  | ### Determinism | 
					
						
						|  |  | 
					
						
						|  | - Global seeding (Python/NumPy/Torch), deterministic DataLoader workers, generator-seeded samplers; cuDNN deterministic & TF32 off. | 
					
						
						|  | - Optional strict mode (`torch.use_deterministic_algorithms(True)`) and deterministic cuBLAS. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🗂️ Repository layout per run | 
					
						
						|  |  | 
					
						
						|  | Each training run uploads a complete bundle at: | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | <repo>/<root>/<DatasetName>/<Timestamp_or_best>/ | 
					
						
						|  | weights/ | 
					
						
						|  | encoder[_<Dataset>].safetensors | 
					
						
						|  | constellation[_<Dataset>].safetensors | 
					
						
						|  | diagnostic_head[_<Dataset>].safetensors | 
					
						
						|  | config.json               # exact config used | 
					
						
						|  | manifest.json             # env, params, dataset, best metrics | 
					
						
						|  | history.json / history.csv | 
					
						
						|  | tensorboard/ (+ zip) | 
					
						
						|  | plots/  # accuracy, loss components, lambda, confusion matrices | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | > We also optionally publish a **`best/`** alias inside each dataset folder pointing to the current champion. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🧩 Intended use & use cases | 
					
						
						|  |  | 
					
						
						|  | **Intended use**: research-grade supervised classification and geometry-regularized representation learning on small images (28×28) across gray and color channels. | 
					
						
						|  |  | 
					
						
						|  | **Example use cases** | 
					
						
						|  |  | 
					
						
						|  | - **Benchmarking** on MNIST family / MedMNIST 2D sets with defensible, reproducible training and complete artifacts. | 
					
						
						|  | - **Geometry-aware representation learning**: analyze how simplex vertices move, how the gate allocates probability mass, and how geometry regularization affects generalization. | 
					
						
						|  | - **Class routing / specialization**: per-vertex group heads provide an interpretable split of classes; confusion-driven vertex reweighting helps diagnose hard groups. | 
					
						
						|  | - **Curriculum & loss ablations**: toggle ROSE, dual InfoNCE, or geometry terms to study their marginal value under a controlled seed. | 
					
						
						|  | - **OOD “pressure tests”** (research): ROSE magnitude and routing entropy can be used as quick signals of uncertainty (not calibrated). | 
					
						
						|  | - **Education & reproducibility**: the runs are fully seeded, include TensorBoard logs and plots, and use safe numerical formulations. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🚫 Out-of-scope / limitations | 
					
						
						|  |  | 
					
						
						|  | - **Not a medical device** – even if trained on MedMNIST subsets, this is not a diagnostic tool. Don’t use it for clinical decisions. | 
					
						
						|  | - **Input size** is 28×28; higher-resolution domains require retraining and likely architecture tweaks. | 
					
						
						|  | - **Dataset bias / shift** – performance depends on the underlying distribution. Evaluate before deployment. | 
					
						
						|  | - **Calibration** – logits are not guaranteed calibrated. For decision thresholds, use a validation set or post-hoc calibration. | 
					
						
						|  | - **Robustness** – robustness to adversarial perturbations is not a design goal here. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 📈 Example results (single-seed snapshots) | 
					
						
						|  |  | 
					
						
						|  | > Numbers below are indicative from our seeded runs with `img_size=28`, size-aware LR schedule and reg ramp; see `manifest.json` in each run for exact details. | 
					
						
						|  |  | 
					
						
						|  | | Dataset        | C | Best Test Acc | Epoch | Notes                                | | 
					
						
						|  | |----------------|---|---------------:|------:|--------------------------------------| | 
					
						
						|  | | MNIST/Fashion* | 1 | 0.97–0.98      | 15–25 | stable losses + reg ramp             | | 
					
						
						|  | | BloodMNIST     | 3 | ~0.95–0.97+    | 20–30 | color preserved, 28×28                | | 
					
						
						|  | | EMNIST (bal)   | 1 | 0.88–0.92      | 25–45 | many classes; pairs auto-scaled      | | 
					
						
						|  |  | 
					
						
						|  | \* depending on which of the pair (MNIST / FashionMNIST) is selected. | 
					
						
						|  | Consult each dataset folder’s `history.csv` for the full learning curve and the **current best** accuracy. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🔧 How to use (PyTorch) | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | import torch | 
					
						
						|  | from safetensors.torch import load_file as load_safetensors | 
					
						
						|  |  | 
					
						
						|  | # --- load weights (example path) --- | 
					
						
						|  | ENC = "weights/encoder_MNIST.safetensors" | 
					
						
						|  | CON = "weights/constellation_MNIST.safetensors" | 
					
						
						|  | DIA = "weights/diagnostic_head_MNIST.safetensors" | 
					
						
						|  |  | 
					
						
						|  | # Recreate model classes (identical definitions to the notebook) | 
					
						
						|  | encoder = PentaFreqEncoderV2(input_dim=28*28, input_ch=1, base_dim=56, num_heads=2, channels=12) | 
					
						
						|  | constellation = BatchedPentachoronConstellation(num_classes=10, dim=56, num_pairs=5, lambda_sep=0.391) | 
					
						
						|  | diag = RoseDiagnosticHead(56) | 
					
						
						|  |  | 
					
						
						|  | encoder.load_state_dict(load_safetensors(ENC)) | 
					
						
						|  | constellation.load_state_dict(load_safetensors(CON)) | 
					
						
						|  | diag.load_state_dict(load_safetensors(DIA)) | 
					
						
						|  |  | 
					
						
						|  | encoder.eval(); constellation.eval() | 
					
						
						|  |  | 
					
						
						|  | # --- dummy inference --- | 
					
						
						|  | # x: [B, C, H, W] converted to float tensor in [0,1]; flatten to [B, C*H*W] | 
					
						
						|  | # use the same normalization as training if you want best performance | 
					
						
						|  | x = torch.rand(8, 1, 28, 28) | 
					
						
						|  | x_flat = x.view(x.size(0), -1) | 
					
						
						|  |  | 
					
						
						|  | with torch.no_grad(): | 
					
						
						|  | z = encoder(x_flat)                    # [B, D] | 
					
						
						|  | logits, diag_out = constellation(z)    # [B, C] | 
					
						
						|  | pred = logits.argmax(dim=1) | 
					
						
						|  | print(pred) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | > To reproduce training, see `config.json` and `history.csv`; all recipes are encoded in the flagship notebook used for these runs. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🔬 Training procedure (default) | 
					
						
						|  |  | 
					
						
						|  | - **Optimizer**: AdamW (β1=0.9, β2=0.999), size-aware LR (≈2e-2 by default) | 
					
						
						|  | - **Schedule**: 10% **warmup** → cosine to `lr_min=1e-6` | 
					
						
						|  | - **Batch size**: up to 2048 (fits on T4/A100 at 28×28) | 
					
						
						|  | - **Loss**: CE + Dual InfoNCE + ROSE InfoNCE + Geometry Reg (ramped) + Diag MSE | 
					
						
						|  | - **Determinism**: seeds for Python/NumPy/Torch (CPU/GPU), deterministic DataLoader workers and samplers, cuDNN deterministic, TF32 off | 
					
						
						|  | - **Numerical safety**: log-softmax contrastive, eigval CM proxy, `nan_to_num` guards, optional step rollback if non-finite | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 📈 Evaluation | 
					
						
						|  |  | 
					
						
						|  | - Main metric: **top-1 accuracy** on the held-out test split defined by each dataset. | 
					
						
						|  | - Diagnostics we log: | 
					
						
						|  | - **Routing entropy** and vertex probabilities | 
					
						
						|  | - **ROSE** magnitudes | 
					
						
						|  | - Confusion matrices (per epoch and “best”) | 
					
						
						|  | - λ (geometry ↔ attention gate) over epochs | 
					
						
						|  | - Full loss decomposition | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🔭 Potential for growth | 
					
						
						|  |  | 
					
						
						|  | - **Hypercube Constellations** (shipped classes in the notebook): scale from 4-simplex to n-cube graphs; compare geometry families. | 
					
						
						|  | - **Multi-resolution** (56→128→256 latent; 28→64→128 images); add pyramid encoders. | 
					
						
						|  | - **Self-distillation / semi-supervised**: use ROSE as a confidence-weighted pseudo-labeling signal. | 
					
						
						|  | - **Better routing**: learned vertex priors per class, entropy regularization, temperature schedules. | 
					
						
						|  | - **Calibration & OOD**: temperature scaling / Dirichlet heads; exploit ROSE magnitude and gating entropy for improved uncertainty estimates. | 
					
						
						|  | - **Deployment adapters**: ONNX / TorchScript exports; small mobile variants of PentaFreq. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## ⚖️ Ethical considerations & implications | 
					
						
						|  |  | 
					
						
						|  | - **Clinical datasets** (MedMNIST) are simplified proxies; they don’t reflect clinical complexity or demographic coverage. | 
					
						
						|  | - **Downstream use** must include dataset-appropriate validation and calibration; this model is for **research** only. | 
					
						
						|  | - **Data bias** and **label noise** can be amplified by strong geometry priors—review confusion matrices and per-class accuracies before claiming improvements. | 
					
						
						|  | - **Positive implications**: the constellation design offers a **transparent, analyzable structure** (per-vertex heads, explicit geometry), easing **interpretability** and **ablation**. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🔁 Reproducibility | 
					
						
						|  |  | 
					
						
						|  | - `config.json` contains all hyperparameters used for each run. | 
					
						
						|  | - `manifest.json` logs environment: Python, Torch, CUDA GPU, RAM, parameter counts. | 
					
						
						|  | - Seeds and determinism flags are printed in logs and set in code. | 
					
						
						|  | - `history.csv` + TensorBoard fully specify the learning trajectory. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🧾 License | 
					
						
						|  |  | 
					
						
						|  | **Apache License 2.0** – see `LICENSE`. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 📣 Citation | 
					
						
						|  |  | 
					
						
						|  | If you use this work, please cite: | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | @software{abstractphil_pentachora_2025, | 
					
						
						|  | author  = {AbstractPhil and Mirel}, | 
					
						
						|  | title   = {Pentachora Adaptive Encoded: Geometry-Regularized Classification with PentaFreq}, | 
					
						
						|  | year    = {2025}, | 
					
						
						|  | license = {Apache-2.0}, | 
					
						
						|  | url     = {https://huggingface.co/AbstractPhil/pentachora-multi-channel-frequency-encoded} | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 🛠️ Changelog (excerpt) | 
					
						
						|  |  | 
					
						
						|  | - **2025-08**: Flagship notebook stabilized (stable losses, eigval CM proxy, NaN rollback, deterministic sweep). | 
					
						
						|  | - **2025-08**: Multi-channel PentaFreq; per-dataset HF folders with full artifacts; optional `best/` alias. | 
					
						
						|  | - **2025-08**: Hypercube constellation classes added for follow-up experiments. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## 💬 Contact | 
					
						
						|  |  | 
					
						
						|  | - **Author:** @AbstractPhil | 
					
						
						|  | - **Quartermaster:** Mirel (ChatGPT – GPT-5 Thinking) | 
					
						
						|  | - **Issues / questions:** open a Discussion on the HF repo or ping the author |