TWEO: Transformers Without Extreme Outliers Enables FP8 Training And Quantization For Dummies

Published in CVPR 2026, 2025

Recommended citation: Guang Liang, Jie Shao, Ningyuan Tang, Xinyao Liu, Jianxin Wu. TWEO: Transformers Without Extreme Outliers Enables FP8 Training And Quantization For Dummies. CVPR 2026. https://arxiv.org/abs/2511.23225

Systematically solves the outlier problem in Transformer activations, significantly improving stability and training speed in general model pretraining while reducing quantization loss. Demonstrates strong domain generalization with significant throughput improvements and inference performance gains on both vision and language foundation models. Successfully validated on thousand-GPU clusters and multi-billion parameter model pretraining.

Download paper here