Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Zhang, Hanlei; Xu, Hua; Long, Fei; Wang, Xin; Gao, Kai

Computer Science > Multimedia

arXiv:2405.12775 (cs)

[Submitted on 21 May 2024]

Title:Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Authors:Hanlei Zhang, Hua Xu, Fei Long, Xin Wang, Kai Gao

View PDF HTML (experimental)

Abstract:Discovering the semantics of multimodal utterances is essential for understanding human language and enhancing human-machine interactions. Existing methods manifest limitations in leveraging nonverbal information for discerning complex semantics in unsupervised scenarios. This paper introduces a novel unsupervised multimodal clustering method (UMC), making a pioneering contribution to this field. UMC introduces a unique approach to constructing augmentation views for multimodal data, which are then used to perform pre-training to establish well-initialized representations for subsequent clustering. An innovative strategy is proposed to dynamically select high-quality samples as guidance for representation learning, gauged by the density of each sample's nearest neighbors. Besides, it is equipped to automatically determine the optimal value for the top-$K$ parameter in each cluster to refine sample selection. Finally, both high- and low-quality samples are used to learn representations conducive to effective clustering. We build baselines on benchmark multimodal intent and dialogue act datasets. UMC shows remarkable improvements of 2-6\% scores in clustering metrics over state-of-the-art methods, marking the first successful endeavor in this domain. The complete code and data are available at this https URL.

Comments:	Accepted by ACL 2024, Main Conference, Long Paper
Subjects:	Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2405.12775 [cs.MM]
	(or arXiv:2405.12775v1 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2405.12775

Submission history

From: Hanlei Zhang [view email]
[v1] Tue, 21 May 2024 13:24:07 UTC (4,146 KB)

Computer Science > Multimedia

Title:Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators