Personalized Prediction of Offensive News Comments by Considering the Characteristics of Commenters
Authors:
Teruki Nakahara,
Taketoshi Ushiama
Abstract:
When reading news articles on social networking services and news sites, readers can view comments marked by other people on these articles. By reading these comments, a reader can understand the public opinion about the news, and it is often helpful to grasp the overall picture of the news. However, these comments often contain offensive language that readers do not prefer to read. This study aim…
▽ More
When reading news articles on social networking services and news sites, readers can view comments marked by other people on these articles. By reading these comments, a reader can understand the public opinion about the news, and it is often helpful to grasp the overall picture of the news. However, these comments often contain offensive language that readers do not prefer to read. This study aims to predict such offensive comments to improve the quality of the experience of the reader while reading comments. By considering the diversity of the readers' values, the proposed method predicts offensive news comments for each reader based on the feedback from a small number of news comments that the reader rated as "offensive" in the past. In addition, we used a machine learning model that considers the characteristics of the commenters to make predictions, independent of the words and topics in news comments. The experimental results of the proposed method show that prediction can be personalized even when the amount of readers' feedback data used in the prediction is limited. In particular, the proposed method, which considers the commenters' characteristics, has a low probability of false detection of offensive comments.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
Micro-Clustering: Finding Small Clusters in Large Diversity
Authors:
Takeaki Uno,
Hiroki Maegawa,
Takanobu Nakahara,
Yukinobu Hamuro,
Ryo Yoshinaka,
Makoto Tatsuta
Abstract:
We address the problem of un-supervised soft-clustering called micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, while standard clustering methods separate records at sparse parts. The problem formulation of micro-clustering is non-trivial. Clique mining in a similarity graph is a typical approach, but it results in a huge numbe…
▽ More
We address the problem of un-supervised soft-clustering called micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, while standard clustering methods separate records at sparse parts. The problem formulation of micro-clustering is non-trivial. Clique mining in a similarity graph is a typical approach, but it results in a huge number of cliques that are of many similar cliques. We propose a new concept data polishing. The cause of huge solutions can be considered that the groups are not clear in the data, that is, the boundaries of the groups are not clear, because of noise, uncertainty, lie, lack, etc. Data polishing clarifies the groups by perturbating the data. Specifically, dense subgraphs that would correspond to clusters are replaced by cliques. The clusters are clarified as maximal cliques, thus the number of maximal cliques will be drastically reduced. We also propose an efficient algorithm applicable even for large scale data. Computational experiments showed the efficiency of our algorithm, i.e., the number of solutions is small, (e.g., 1,000), the members of each group are deeply related, and the computation time is short.
△ Less
Submitted 6 June, 2016; v1 submitted 11 July, 2015;
originally announced July 2015.