-
Israel-Hamas war through Telegram, Reddit and Twitter
Authors:
Despoina Antonakaki,
Sotiris Ioannidis
Abstract:
The Israeli-Palestinian conflict started on 7 October 2023, have resulted thus far to over 48,000 people killed including more than 17,000 children with a majority from Gaza, more than 30,000 people injured, over 10,000 missing, and over 1 million people displaced, fleeing conflict zones. The infrastructure damage includes the 87\% of housing units, 80\% of public buildings and 60\% of cropland 17…
▽ More
The Israeli-Palestinian conflict started on 7 October 2023, have resulted thus far to over 48,000 people killed including more than 17,000 children with a majority from Gaza, more than 30,000 people injured, over 10,000 missing, and over 1 million people displaced, fleeing conflict zones. The infrastructure damage includes the 87\% of housing units, 80\% of public buildings and 60\% of cropland 17 out of 36 hospitals, 68\% of road networks and 87\% of school buildings damaged. This conflict has as well launched an online discussion across various social media platforms. Telegram was no exception due to its encrypted communication and highly involved audience. The current study will cover an analysis of the related discussion in relation to different participants of the conflict and sentiment represented in those discussion. To this end, we prepared a dataset of 125K messages shared on channels in Telegram spanning from 23 October 2025 until today. Additionally, we apply the same analysis in two publicly available datasets from Twitter containing 2001 tweets and from Reddit containing 2M opinions. We apply a volume analysis across the three datasets, entity extraction and then proceed to BERT topic analysis in order to extract common themes or topics. Next, we apply sentiment analysis to analyze the emotional tone of the discussions. Our findings hint at polarized narratives as the hallmark of how political factions and outsiders mold public opinion. We also analyze the sentiment-topic prevalence relationship, detailing the trends that may show manipulation and attempts of propaganda by the involved parties. This will give a better understanding of the online discourse on the Israel-Palestine conflict and contribute to the knowledge on the dynamics of social media communication during geopolitical crises.
△ Less
Submitted 30 January, 2025;
originally announced February 2025.
-
Russo-Ukrainian War: Prediction and explanation of Twitter suspension
Authors:
Alexander Shevtsov,
Despoina Antonakaki,
Ioannis Lamprou,
Ioannis Kontogiorgakis,
Polyvios Pratikakis,
Sotiris Ioannidis
Abstract:
On 24 February 2022, Russia invaded Ukraine, starting what is now known as the Russo-Ukrainian War, initiating an online discourse on social media. Twitter as one of the most popular SNs, with an open and democratic character, enables a transparent discussion among its large user base. Unfortunately, this often leads to Twitter's policy violations, propaganda, abusive actions, civil integrity viol…
▽ More
On 24 February 2022, Russia invaded Ukraine, starting what is now known as the Russo-Ukrainian War, initiating an online discourse on social media. Twitter as one of the most popular SNs, with an open and democratic character, enables a transparent discussion among its large user base. Unfortunately, this often leads to Twitter's policy violations, propaganda, abusive actions, civil integrity violation, and consequently to user accounts' suspension and deletion. This study focuses on the Twitter suspension mechanism and the analysis of shared content and features of the user accounts that may lead to this. Toward this goal, we have obtained a dataset containing 107.7M tweets, originating from 9.8 million users, using Twitter API. We extract the categories of shared content of the suspended accounts and explain their characteristics, through the extraction of text embeddings in junction with cosine similarity clustering. Our results reveal scam campaigns taking advantage of trending topics regarding the Russia-Ukrainian conflict for Bitcoin and Ethereum fraud, spam, and advertisement campaigns. Additionally, we apply a machine learning methodology including a SHapley Additive explainability model to understand and explain how user accounts get suspended.
△ Less
Submitted 27 December, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline
Authors:
Alexander Shevtsov,
Despoina Antonakaki,
Ioannis Lamprou,
Polyvios Pratikakis,
Sotiris Ioannidis
Abstract:
Twitter, as one of the most popular social networks, provides a platform for communication and online discourse. Unfortunately, it has also become a target for bots and fake accounts, resulting in the spread of false information and manipulation. This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges associated with machine learning model develo…
▽ More
Twitter, as one of the most popular social networks, provides a platform for communication and online discourse. Unfortunately, it has also become a target for bots and fake accounts, resulting in the spread of false information and manipulation. This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges associated with machine learning model development. Through this pipeline, we develop a comprehensive bot detection model named BotArtist, based on user profile features. SAMLP leverages nine distinct publicly available datasets to train the BotArtist model. To assess BotArtist's performance against current state-of-the-art solutions, we evaluate 35 existing Twitter bot detection methods, each utilizing a diverse range of features. Our comparative evaluation of BotArtist and these existing methods, conducted across nine public datasets under standardized conditions, reveals that the proposed model outperforms existing solutions by almost 10% in terms of F1-score, achieving an average score of 83.19% and 68.5% over specific and general approaches, respectively. As a result of this research, we provide one of the largest labeled Twitter bot datasets. The dataset contains extracted features combined with BotArtist predictions for 10,929,533 Twitter user profiles, collected via Twitter API during the 2022 Russo-Ukrainian War over a 16-month period. This dataset was created based on [Shevtsov et al., 2022a] where the original authors share anonymized tweets discussing the Russo-Ukrainian war, totaling 127,275,386 tweets. The combination of the existing textual dataset and the provided labeled bot and human profiles will enable future development of more advanced bot detection large language models in the post-Twitter API era.
△ Less
Submitted 14 April, 2025; v1 submitted 31 May, 2023;
originally announced June 2023.
-
Twitter Dataset on the Russo-Ukrainian War
Authors:
Alexander Shevtsov,
Christos Tzagkarakis,
Despoina Antonakaki,
Polyvios Pratikakis,
Sotiris Ioannidis
Abstract:
On 24 February 2022, Russia invaded Ukraine, also known now as Russo-Ukrainian War. We have initiated an ongoing dataset acquisition from Twitter API. Until the day this paper was written the dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users. We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation…
▽ More
On 24 February 2022, Russia invaded Ukraine, also known now as Russo-Ukrainian War. We have initiated an ongoing dataset acquisition from Twitter API. Until the day this paper was written the dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users. We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Identification of Twitter Bots Based on an Explainable Machine Learning Framework: The US 2020 Elections Case Study
Authors:
Alexander Shevtsov,
Christos Tzagkarakis,
Despoina Antonakaki,
Sotiris Ioannidis
Abstract:
Twitter is one of the most popular social networks attracting millions of users, while a considerable proportion of online discourse is captured. It provides a simple usage framework with short messages and an efficient application programming interface (API) enabling the research community to study and analyze several aspects of this social network. However, the Twitter usage simplicity can lead…
▽ More
Twitter is one of the most popular social networks attracting millions of users, while a considerable proportion of online discourse is captured. It provides a simple usage framework with short messages and an efficient application programming interface (API) enabling the research community to study and analyze several aspects of this social network. However, the Twitter usage simplicity can lead to malicious handling by various bots. The malicious handling phenomenon expands in online discourse, especially during the electoral periods, where except the legitimate bots used for dissemination and communication purposes, the goal is to manipulate the public opinion and the electorate towards a certain direction, specific ideology, or political party. This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data. To this end, a supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm, where the hyper-parameters are tuned via cross-validation. Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions by calculating feature importance, using the game theoretic-based Shapley values. Experimental evaluation on distinct Twitter datasets demonstrate the superiority of our approach, in terms of bot detection accuracy, when compared against a recent state-of-the-art Twitter bot detection method.
△ Less
Submitted 14 December, 2021; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Discovery and classification of Twitter bots
Authors:
Alexander Shevtsov Alexander Shevtsov,
Maria Oikonomidou,
Despoina Antonakaki,
Polyvios Pratikakis,
Alexandros Kanterakis,
Sotiris Ioannidis,
Paraskevi Fragopoulou
Abstract:
A very large number of people use Online Social Networks daily. Such platforms thus become attractive targets for agents that seek to gain access to the attention of large audiences, and influence perceptions or opinions. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the socia…
▽ More
A very large number of people use Online Social Networks daily. Such platforms thus become attractive targets for agents that seek to gain access to the attention of large audiences, and influence perceptions or opinions. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and to create an illusion of community behavior, amplifying their message and increasing persuasion.
This paper investigates Twitter botnets, their behavior, their interaction with user communities and their evolution over time. We analyzed a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. We detected over a million events where seemingly unrelated accounts tweeted nearly identical content at nearly the same time. We filtered these concurrent content injection events and detected a set of 1,850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same software. We found botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze statistical differences between bot accounts and human users, as well as botnet interaction with user communities and Twitter trending topics.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Analysis of Twitter and YouTube during USelections 2020
Authors:
Alexander Shevtsov,
Maria Oikonomidou,
Despoina Antonakaki,
Polyvios Pratikakis,
Sotiris Ioannidis
Abstract:
The presidential elections in the United States on 3 November 2020 have caused extensive discussions on social media. A part of the content on US elections is organic, coming from users discussing their opinions of the candidates, political positions, or relevant content presented on television. Another significant part of the content generated originates from organized campaigns, both official an…
▽ More
The presidential elections in the United States on 3 November 2020 have caused extensive discussions on social media. A part of the content on US elections is organic, coming from users discussing their opinions of the candidates, political positions, or relevant content presented on television. Another significant part of the content generated originates from organized campaigns, both official and by astroturfing.
In this study, we obtain approximately 17.5M tweets containing 3M users, based on prevalent hashtags related to US election 2020, as well as the related YouTube links, contained in the Twitter dataset, likes, dislikes and comments of the videos and conduct volume, sentiment and graph analysis on the communities formed.
Particularly, we study the daily traffic per prevalent hashtags, plot the retweet graph from July to September 2020, show how its main connected component becomes denser in the period closer to the elections and highlight the two main entities ('Biden' and 'Trump'). Additionally, we gather the related YouTube links contained in the previous dataset and perform sentiment analysis. The results on sentiment analysis on the Twitter corpus and the YouTube metadata gathered, show the positive and negative sentiment for the two entities throughout this period. The results of sentiment analysis indicate that 45.7% express positive sentiment towards Trump in Twitter and 33.8% positive sentiment towards Biden, while 14.55% of users express positive sentiment in YouTube metadata gathered towards Trump and 8.7% positive sentiment towards Biden. Our analysis fill the gap between the connection of offline events and their consequences in social media by monitoring important events in real world and measuring public volume and sentiment before and after the event in social media.
△ Less
Submitted 10 November, 2020; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Evolving Twitter: an experimental analysis of graph properties of the social graph
Authors:
Despoina Antonakaki,
Sotiris Ioannidis,
Paraskevi Fragopoulou
Abstract:
Twitter is one of the most prominent Online Social Networks. It covers a significant part of the online worldwide population~20% and has impressive growth rates. The social graph of Twitter has been the subject of numerous studies since it can reveal the intrinsic properties of large and complex online communities. Despite the plethora of these studies, there is a limited cover on the properties o…
▽ More
Twitter is one of the most prominent Online Social Networks. It covers a significant part of the online worldwide population~20% and has impressive growth rates. The social graph of Twitter has been the subject of numerous studies since it can reveal the intrinsic properties of large and complex online communities. Despite the plethora of these studies, there is a limited cover on the properties of the social graph while they evolve over time. Moreover, due to the extreme size of this social network (millions of nodes, billions of edges), there is a small subset of possible graph properties that can be efficiently measured in a reasonable timescale. In this paper we propose a sampling framework that allows the estimation of graph properties on large social networks. We apply this framework to a subset of Twitter's social network that has 13.2 million users, 8.3 billion edges and covers the complete Twitter timeline (from April 2006 to January 2015). We derive estimation on the time evolution of 24 graph properties many of which have never been measured on large social networks. We further discuss how these estimations shed more light on the inner structure and growth dynamics of Twitter's social network.
△ Less
Submitted 5 October, 2015;
originally announced October 2015.