Norm of Mean Contextualized Embeddings Determines their Variance

Yamagiwa, Hiroaki; Shimodaira, Hidetoshi

Computer Science > Computation and Language

arXiv:2409.11253 (cs)

[Submitted on 17 Sep 2024 (v1), last revised 17 Dec 2024 (this version, v2)]

Title:Norm of Mean Contextualized Embeddings Determines their Variance

Authors:Hiroaki Yamagiwa, Hidetoshi Shimodaira

View PDF HTML (experimental)

Abstract:Contextualized embeddings vary by context, even for the same token, and form a distribution in the embedding space. To analyze this distribution, we focus on the norm of the mean embedding and the variance of the embeddings. In this study, we first demonstrate that these values follow the well-known formula for variance in statistics and provide an efficient sequential computation method. Then, by observing embeddings from intermediate layers of several Transformer models, we found a strong trade-off relationship between the norm and the variance: as the mean embedding becomes closer to the origin, the variance increases. This trade-off is likely influenced by the layer normalization mechanism used in Transformer models. Furthermore, when the sets of token embeddings are treated as clusters, we show that the variance of the entire embedding set can theoretically be decomposed into the within-cluster variance and the between-cluster variance. We found experimentally that as the layers of Transformer models deepen, the embeddings move farther from the origin, the between-cluster variance relatively decreases, and the within-cluster variance relatively increases. These results are consistent with existing studies on the anisotropy of the embedding spaces across layers.

Comments:	COLING 2025
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2409.11253 [cs.CL]
	(or arXiv:2409.11253v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2409.11253

Submission history

From: Hiroaki Yamagiwa [view email]
[v1] Tue, 17 Sep 2024 15:02:23 UTC (11,188 KB)
[v2] Tue, 17 Dec 2024 07:07:52 UTC (13,373 KB)

Computer Science > Computation and Language

Title:Norm of Mean Contextualized Embeddings Determines their Variance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Norm of Mean Contextualized Embeddings Determines their Variance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators