$S^3$ -- Semantic Signal Separation

Kardos, Márton; Kostkan, Jan; Vermillet, Arnault-Quentin; Nielbo, Kristoffer; Enevoldsen, Kenneth; Rocca, Roberta

Computer Science > Machine Learning

arXiv:2406.09556 (cs)

[Submitted on 13 Jun 2024 (v1), last revised 19 May 2025 (this version, v3)]

Title:$S^3$ -- Semantic Signal Separation

Authors:Márton Kardos, Jan Kostkan, Arnault-Quentin Vermillet, Kristoffer Nielbo, Kenneth Enevoldsen, Roberta Rocca

View PDF HTML (experimental)

Abstract:Topic models are useful tools for discovering latent semantic structures in large textual corpora. Recent efforts have been oriented at incorporating contextual representations in topic modeling and have been shown to outperform classical topic models. These approaches are typically slow, volatile, and require heavy preprocessing for optimal results. We present Semantic Signal Separation ($S^3$), a theory-driven topic modeling approach in neural embedding spaces. $S^3$ conceptualizes topics as independent axes of semantic space and uncovers these by decomposing contextualized document embeddings using Independent Component Analysis. Our approach provides diverse and highly coherent topics, requires no preprocessing, and is demonstrated to be the fastest contextual topic model, being, on average, 4.5x faster than the runner-up BERTopic. We offer an implementation of $S^3$, and all contextual baselines, in the Turftopic Python package.

Comments:	24 pages, 13 figures (main manuscript has 9 pages and 7 figures); The paper has been adjusted according to reviewers' feedback
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
ACM classes:	I.2.7
Cite as:	arXiv:2406.09556 [cs.LG]
	(or arXiv:2406.09556v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.09556

Submission history

From: Márton Kardos [view email]
[v1] Thu, 13 Jun 2024 19:43:38 UTC (12,732 KB)
[v2] Tue, 18 Jun 2024 14:12:18 UTC (12,320 KB)
[v3] Mon, 19 May 2025 11:30:33 UTC (13,323 KB)

Computer Science > Machine Learning

Title:$S^3$ -- Semantic Signal Separation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:$S^3$ -- Semantic Signal Separation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators