How Smooth Is Attention? - Apple Machine Learning Research

Self-attention and masked self-attention are at the heart of Transformers’ outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties — which are key when it comes to analyzing robustness and expressive power — is incomplete. We provide a detailed study of the Lipschitz constant of self-attention in several practical scenarios, discussing the impact of the sequence length and layer normalization on the local Lipschitz constant of both unmasked and masked self-attention. In particular, we show that for inputs of length n in any compact set, the Lipschitz constant of self-attention is bounded by sqrt(n) up to a constant factor and that this bound is tight for reasonable sequence lengths. When the sequence length n is too large for the previous bound to be tight, which we refer to as the mean-field regime, we provide an upper bound and a matching lower bound which are independent of n. Our mean-field framework for masked self-attention is novel and of independent interest. Our experiments on pretrained and randomly initialized BERT and GPT-2 support our theoretical findings.
Figure 1: Regularity of the attention layer as a function of sequence length for different architectures.

How Smooth Is Attention? – Apple Machine Learning Research

Mortgage Rates Could Fall Another Half Point Just from Market Normalization

Farewell: Fintech Nexus is shutting down

Goldman Sachs loses profit after hits from GreenSky, real estate

What Are the Benefits of Using Turnitin AI Detection?

This Man Can Make Anyone Go Viral

Mortgage Rates Could Fall Another Half Point Just from Market Normalization

Farewell: Fintech Nexus is shutting down

Goldman Sachs loses profit after hits from GreenSky, real estate

What Are the Benefits of Using Turnitin AI Detection?

Company

Latest

Mortgage Rates Could Fall Another Half Point Just from Market Normalization

Farewell: Fintech Nexus is shutting down

Goldman Sachs loses profit after hits from GreenSky, real estate

Popular

Mortgage Rates Could Fall Another Half Point Just from Market Normalization

Farewell: Fintech Nexus is shutting down

Goldman Sachs loses profit after hits from GreenSky, real estate