view article Article Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models By wubingheng and 2 others • 19 days ago • 5