Weakly-Supervised Neural Text Classification

Meng, Yu; Shen, Jiaming; Zhang, Chao; Han, Jiawei

doi:10.1145/3269206.3271737

Computer Science > Information Retrieval

arXiv:1809.01478 (cs)

[Submitted on 2 Sep 2018 (v1), last revised 12 Sep 2018 (this version, v2)]

Title:Weakly-Supervised Neural Text Classification

Authors:Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

View PDF

Abstract:Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.

Comments:	CIKM 2018 Full Paper
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1809.01478 [cs.IR]
	(or arXiv:1809.01478v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1809.01478
Related DOI:	https://doi.org/10.1145/3269206.3271737

Submission history

From: Yu Meng [view email]
[v1] Sun, 2 Sep 2018 02:56:25 UTC (2,481 KB)
[v2] Wed, 12 Sep 2018 04:34:59 UTC (4,655 KB)

Computer Science > Information Retrieval

Title:Weakly-Supervised Neural Text Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Weakly-Supervised Neural Text Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators