TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification

Botzer, Nicholas; Vasquez, David; Weninger, Tim; Laradji, Issam

Computer Science > Machine Learning

arXiv:2310.11607 (cs)

[Submitted on 17 Oct 2023]

Title:TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification

Authors:Nicholas Botzer, David Vasquez, Tim Weninger, Issam Laradji

View PDF

Abstract:The ability to detect intent in dialogue systems has become increasingly important in modern technology. These systems often generate a large amount of unlabeled data, and manually labeling this data requires substantial human effort. Semi-supervised methods attempt to remedy this cost by using a model trained on a few labeled examples and then by assigning pseudo-labels to further a subset of unlabeled examples that has a model prediction confidence higher than a certain threshold. However, one particularly perilous consequence of these methods is the risk of picking an imbalanced set of examples across classes, which could lead to poor labels. In the present work, we describe Top-K K-Nearest Neighbor (TK-KNN), which uses a more robust pseudo-labeling approach based on distance in the embedding space while maintaining a balanced set of pseudo-labeled examples across classes through a ranking-based approach. Experiments on several datasets show that TK-KNN outperforms existing models, particularly when labeled data is scarce on popular datasets such as CLINC150 and Banking77. Code is available at this https URL

Comments:	9 pages, 6 figures, 4 tables
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2310.11607 [cs.LG]
	(or arXiv:2310.11607v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.11607

Submission history

From: Nicholas Botzer [view email]
[v1] Tue, 17 Oct 2023 22:00:42 UTC (1,021 KB)

Computer Science > Machine Learning

Title:TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators