Or maybe it was the best idea
I think I found the answer to the performance dips.
From 75 second epochs to 15 second epochs. Most likely reduce-overhead works as well but I'm using standard compilation.
Switched back to the anchored momentum system, which proved to be accurate, fast, and good for rapid experiments.
The key difference in this one and the last version is a shared global constellation, rather than local constellations.
The local constellations made enough problems, so I went back to the tested global constellation structure and am working from there.
Cross your fingers. I think this might work.
Yeah a transformer was a bad idea
I knew the geolip-transformer wasn't ready to be created. The optimization isn't there yet, and the components are too thick.
I took a shot anyway, and my foot took the hit.
To yield a 256 model dim that can house the entirety of a geolip flow matching ensemble today, the structure requires 90 gigs of vram to train.
The backward compartmentalization explodes because of the potentials in the grads. With that the grads STILL don't yield what they need to.
I'm going to need an SVD autoencoder of some sort, if I can make that work that is. It's highly complex and it may not produce stable VAE relations.
As of right this minute, the geolip-transformer is on hold - and I will revisit this one very soon with the correct optimizations.
Probably was a bad idea
A long session of debugging and MULTIPLE versions under the radar, we have a new one.
The full-graph is riddled with bugs, I'm working it out.
I may have gone a little overboard.
