A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery

Zhang, Hanlei; Xu, Hua; Wang, Xin; Long, Fei; Gao, Kai

doi:10.1109/TKDE.2023.3340732

Computer Science > Computation and Language

arXiv:2304.07699 (cs)

[Submitted on 16 Apr 2023 (v1), last revised 13 Dec 2023 (this version, v3)]

Title:A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery

Authors:Hanlei Zhang, Hua Xu, Xin Wang, Fei Long, Kai Gao

View PDF HTML (experimental)

Abstract:New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services. However, most existing methods struggle to capture the complicated semantics of discrete text representations when limited or no prior knowledge of labeled data is available. To tackle this problem, we propose a novel clustering framework, USNID, for unsupervised and semi-supervised new intent discovery, which has three key technologies. First, it fully utilizes unsupervised or semi-supervised data to mine shallow semantic similarity relations and provide well-initialized representations for clustering. Second, it designs a centroid-guided clustering mechanism to address the issue of cluster allocation inconsistency and provide high-quality self-supervised targets for representation learning. Third, it captures high-level semantics in unsupervised or semi-supervised data to discover fine-grained intent-wise clusters by optimizing both cluster-level and instance-level objectives. We also propose an effective method for estimating the cluster number in open-world scenarios without knowing the number of new intents beforehand. USNID performs exceptionally well on several benchmark intent datasets, achieving new state-of-the-art results in unsupervised and semi-supervised new intent discovery and demonstrating robust performance with different cluster numbers.

Comments:	Accepted by IEEE TKDE
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2304.07699 [cs.CL]
	(or arXiv:2304.07699v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.07699
Journal reference:	IEEE Transactions on Knowledge and Data Engineering 2023
Related DOI:	https://doi.org/10.1109/TKDE.2023.3340732

Submission history

From: Hanlei Zhang [view email]
[v1] Sun, 16 Apr 2023 05:30:42 UTC (624 KB)
[v2] Sat, 9 Dec 2023 14:26:44 UTC (1,122 KB)
[v3] Wed, 13 Dec 2023 01:39:20 UTC (1,122 KB)

Computer Science > Computation and Language

Title:A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators