Wasserstein Distances, Neuronal Entanglement, and Sparsity

Sawmya, Shashata; Kong, Linghao; Markov, Ilia; Alistarh, Dan; Shavit, Nir

Computer Science > Machine Learning

arXiv:2405.15756 (cs)

[Submitted on 24 May 2024 (v1), last revised 26 Feb 2025 (this version, v4)]

Title:Wasserstein Distances, Neuronal Entanglement, and Sparsity

Authors:Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

View PDF HTML (experimental)

Abstract:Disentangling polysemantic neurons is at the core of many current approaches to interpretability of large language models. Here we attempt to study how disentanglement can be used to understand performance, particularly under weight sparsity, a leading post-training optimization technique. We suggest a novel measure for estimating neuronal entanglement: the Wasserstein distance of a neuron's output distribution to a Gaussian. Moreover, we show the existence of a small number of highly entangled "Wasserstein Neurons" in each linear layer of an LLM, characterized by their highly non-Gaussian output distributions, their role in mapping similar inputs to dissimilar outputs, and their significant impact on model accuracy. To study these phenomena, we propose a new experimental framework for disentangling polysemantic neurons. Our framework separates each layer's inputs to create a mixture of experts where each neuron's output is computed by a mixture of neurons of lower Wasserstein distance, each better at maintaining accuracy when sparsified without retraining. We provide strong evidence that this is because the mixture of sparse experts is effectively disentangling the input-output relationship of individual neurons, in particular the difficult Wasserstein neurons.

Comments:	10 pages, 9 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.15756 [cs.LG]
	(or arXiv:2405.15756v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.15756

Submission history

From: Shashata Sawmya [view email]
[v1] Fri, 24 May 2024 17:51:39 UTC (5,263 KB)
[v2] Mon, 24 Jun 2024 22:14:42 UTC (5,263 KB)
[v3] Mon, 17 Feb 2025 01:06:24 UTC (11,484 KB)
[v4] Wed, 26 Feb 2025 17:32:10 UTC (11,484 KB)

Computer Science > Machine Learning

Title:Wasserstein Distances, Neuronal Entanglement, and Sparsity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Wasserstein Distances, Neuronal Entanglement, and Sparsity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators