Publication

Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization

Josef van Genabith, Hongfei Xu, Qiuhui Liz, Jingyi Zhang

keine Angabe .

Abstract

..

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz