Submitted by Jingfeng Yao 96 Towards Scalable Pre-training of Visual Tokenizers for Generation MiniMax 369 4