Detecting Text Formality: A Study of Text Classification Approaches

Dementieva, Daryna; Trifinov, Ivan; Likhachev, Andrey; Panchenko, Alexander

Computer Science > Computation and Language

arXiv:2204.08975v1 (cs)

[Submitted on 19 Apr 2022 (this version), latest version 8 Sep 2023 (v2)]

Title:Detecting Text Formality: A Study of Text Classification Approaches

Authors:Daryna Dementieva, Ivan Trifinov, Andrey Likhachev, Alexander Panchenko

View PDF

Abstract:Formality is an important characteristic of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks, such as retrieval of texts with a desired formality level, integration in language learning and document editing platforms, or evaluating the desired conversation tone by chatbots. Recently two large-scale datasets were introduced for multiple languages featuring formality annotation. However, they were primarily used for the training of style transfer models. However, detection text formality on its own may also be a useful application. This work proposes the first systematic study of formality detection methods based on current (and more classic) machine learning methods and delivers the best-performing models for public usage. We conducted three types of experiments -- monolingual, multilingual, and cross-lingual. The study shows the overcome of BiLSTM-based models over transformer-based ones for the formality classification task. We release formality detection models for several languages yielding state of the art results and possessing tested cross-lingual capabilities.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2204.08975 [cs.CL]
	(or arXiv:2204.08975v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.08975

Submission history

From: Daryna Dementieva [view email]
[v1] Tue, 19 Apr 2022 16:23:07 UTC (164 KB)
[v2] Fri, 8 Sep 2023 09:11:02 UTC (47 KB)

Computer Science > Computation and Language

Title:Detecting Text Formality: A Study of Text Classification Approaches

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Detecting Text Formality: A Study of Text Classification Approaches

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators