Search | arXiv e-print repository

arXiv:2012.13235 [pdf, other]

Detecting Hateful Memes Using a Multimodal Deep Ensemble

Abstract: While significant progress has been made using machine learning algorithms to detect hate speech, important technical challenges still remain to be solved in order to bring their performance closer to human accuracy. We investigate several of the most recent visual-linguistic Transformer architectures and propose improvements to increase their performance for this task. The proposed model outperfo… ▽ More While significant progress has been made using machine learning algorithms to detect hate speech, important technical challenges still remain to be solved in order to bring their performance closer to human accuracy. We investigate several of the most recent visual-linguistic Transformer architectures and propose improvements to increase their performance for this task. The proposed model outperforms the baselines by a large margin and ranks 5$^{th}$ on the leaderboard out of 3,100+ participants. △ Less

Submitted 24 December, 2020; originally announced December 2020.

Comments: 6 pages, NeurIPS 2020, The Hateful Memes Challenge Workshop at NeurIPS 2020

Journal ref: The Hateful Memes Challenge Workshop at NeurIPS 2020

arXiv:1609.02728 [pdf, other]

Predicting the future relevance of research institutions - The winning solution of the KDD Cup 2016

Authors: Vlad Sandulescu, Mihai Chiru

Abstract: The world's collective knowledge is evolving through research and new scientific discoveries. It is becoming increasingly difficult to objectively rank the impact research institutes have on global advancements. However, since the funding, governmental support, staff and students quality all mirror the projected quality of the institution, it becomes essential to measure the affiliation's rating i… ▽ More The world's collective knowledge is evolving through research and new scientific discoveries. It is becoming increasingly difficult to objectively rank the impact research institutes have on global advancements. However, since the funding, governmental support, staff and students quality all mirror the projected quality of the institution, it becomes essential to measure the affiliation's rating in a transparent and widely accepted way. We propose and investigate several methods to rank affiliations based on the number of their accepted papers at future academic conferences. We carry out our investigation using publicly available datasets such as the Microsoft Academic Graph, a heterogeneous graph which contains various information about academic papers. We analyze several models, starting with a simple probabilities-based method and then gradually expand our training dataset, engineer many more features and use mixed models and gradient boosted decision trees models to improve our predictions. △ Less

Submitted 9 September, 2016; originally announced September 2016.

Comments: 6 pages, KDD 2016, KDD Cup 2016

Journal ref: The KDD Cup Workshop at KDD 2016

arXiv:1609.02727 [pdf, other]

doi 10.1145/2740908.2742570

Detecting Singleton Review Spammers Using Semantic Similarity

Authors: Vlad Sandulescu, Martin Ester

Abstract: Online reviews have increasingly become a very important resource for consumers when making purchases. Though it is becoming more and more difficult for people to make well-informed buying decisions without being deceived by fake reviews. Prior works on the opinion spam problem mostly considered classifying fake reviews using behavioral user patterns. They focused on prolific users who write more… ▽ More Online reviews have increasingly become a very important resource for consumers when making purchases. Though it is becoming more and more difficult for people to make well-informed buying decisions without being deceived by fake reviews. Prior works on the opinion spam problem mostly considered classifying fake reviews using behavioral user patterns. They focused on prolific users who write more than a couple of reviews, discarding one-time reviewers. The number of singleton reviewers however is expected to be high for many review websites. While behavioral patterns are effective when dealing with elite users, for one-time reviewers, the review text needs to be exploited. In this paper we tackle the problem of detecting fake reviews written by the same person using multiple names, posting each review under a different name. We propose two methods to detect similar reviews and show the results generally outperform the vectorial similarity measures used in prior works. The first method extends the semantic similarity between words to the reviews level. The second method is based on topic modeling and exploits the similarity of the reviews topic distributions using two models: bag-of-words and bag-of-opinion-phrases. The experiments were conducted on reviews from three different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset (800 reviews). △ Less

Submitted 9 September, 2016; originally announced September 2016.

Comments: 6 pages, WWW 2015

ACM Class: I.7.0; J.4

Journal ref: WWW '15 Companion Proceedings of the 24th International Conference on World Wide Web, 2015, p.971-976

Showing 1–3 of 3 results for author: Sandulescu, V