-
Redefining Toxicity: An Objective and Context-Aware Approach for Stress-Level-Based Detection
Authors:
Sergey Berezin,
Reza Farahbakhsh,
Noel Crespi
Abstract:
The fundamental problem of toxicity detection lies in the fact that the term "toxicity" is ill-defined. Such uncertainty causes researchers to rely on subjective and vague data during model training, which leads to non-robust and inaccurate results, following the 'garbage in - garbage out' paradigm. This study introduces a novel, objective, and context-aware framework for toxicity detection, lever…
▽ More
The fundamental problem of toxicity detection lies in the fact that the term "toxicity" is ill-defined. Such uncertainty causes researchers to rely on subjective and vague data during model training, which leads to non-robust and inaccurate results, following the 'garbage in - garbage out' paradigm. This study introduces a novel, objective, and context-aware framework for toxicity detection, leveraging stress levels as a key determinant of toxicity. We propose new definition, metric and training approach as a parts of our framework and demonstrate it's effectiveness using a dataset we collected.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMs
Authors:
Sergey Berezin,
Reza Farahbakhsh,
Noel Crespi
Abstract:
We present a novel class of jailbreak adversarial attacks on LLMs, termed Task-in-Prompt (TIP) attacks. Our approach embeds sequence-to-sequence tasks (e.g., cipher decoding, riddles, code execution) into the model's prompt to indirectly generate prohibited inputs. To systematically assess the effectiveness of these attacks, we introduce the PHRYGE benchmark. We demonstrate that our techniques suc…
▽ More
We present a novel class of jailbreak adversarial attacks on LLMs, termed Task-in-Prompt (TIP) attacks. Our approach embeds sequence-to-sequence tasks (e.g., cipher decoding, riddles, code execution) into the model's prompt to indirectly generate prohibited inputs. To systematically assess the effectiveness of these attacks, we introduce the PHRYGE benchmark. We demonstrate that our techniques successfully circumvent safeguards in six state-of-the-art language models, including GPT-4o and LLaMA 3.2. Our findings highlight critical weaknesses in current LLM safety alignments and underscore the urgent need for more sophisticated defence strategies.
Warning: this paper contains examples of unethical inquiries used solely for research purposes.
△ Less
Submitted 4 February, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning
Authors:
Aditya Narayan Sankaran,
Reza Farahbakhsh,
Noel Crespi
Abstract:
Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explo…
▽ More
Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.
△ Less
Submitted 13 December, 2024; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity
Authors:
Sergey Berezin,
Reza Farahbakhsh,
Noel Crespi
Abstract:
We introduce a novel family of adversarial attacks that exploit the inability of language models to interpret ASCII art. To evaluate these attacks, we propose the ToxASCII benchmark and develop two custom ASCII art fonts: one leveraging special tokens and another using text-filled letter shapes. Our attacks achieve a perfect 1.0 Attack Success Rate across ten models, including OpenAI's o1-preview…
▽ More
We introduce a novel family of adversarial attacks that exploit the inability of language models to interpret ASCII art. To evaluate these attacks, we propose the ToxASCII benchmark and develop two custom ASCII art fonts: one leveraging special tokens and another using text-filled letter shapes. Our attacks achieve a perfect 1.0 Attack Success Rate across ten models, including OpenAI's o1-preview and LLaMA 3.1.
Warning: this paper contains examples of toxic language used for research purposes.
△ Less
Submitted 9 October, 2024; v1 submitted 27 September, 2024;
originally announced September 2024.
-
Adversarial Botometer: Adversarial Analysis for Social Bot Detection
Authors:
Shaghayegh Najari,
Davood Rafiee,
Mostafa Salehi,
Reza Farahbakhsh
Abstract:
Social bots play a significant role in many online social networks (OSN) as they imitate human behavior. This fact raises difficult questions about their capabilities and potential risks. Given the recent advances in Generative AI (GenAI), social bots are capable of producing highly realistic and complex content that mimics human creativity. As the malicious social bots emerge to deceive people wi…
▽ More
Social bots play a significant role in many online social networks (OSN) as they imitate human behavior. This fact raises difficult questions about their capabilities and potential risks. Given the recent advances in Generative AI (GenAI), social bots are capable of producing highly realistic and complex content that mimics human creativity. As the malicious social bots emerge to deceive people with their unrealistic content, identifying them and distinguishing the content they produce has become an actual challenge for numerous social platforms. Several approaches to this problem have already been proposed in the literature, but the proposed solutions have not been widely evaluated. To address this issue, we evaluate the behavior of a text-based bot detector in a competitive environment where some scenarios are proposed: \textit{First}, the tug-of-war between a bot and a bot detector is examined. It is interesting to analyze which party is more likely to prevail and which circumstances influence these expectations. In this regard, we model the problem as a synthetic adversarial game in which a conversational bot and a bot detector are engaged in strategic online interactions. \textit{Second}, the bot detection model is evaluated under attack examples generated by a social bot; to this end, we poison the dataset with attack examples and evaluate the model performance under this condition. \textit{Finally}, to investigate the impact of the dataset, a cross-domain analysis is performed. Through our comprehensive evaluation of different categories of social bots using two benchmark datasets, we were able to demonstrate some achivement that could be utilized in future works.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
No offence, Bert -- I insult only humans! Multiple addressees sentence-level attack on toxicity detection neural network
Authors:
Sergey Berezin,
Reza Farahbakhsh,
Noel Crespi
Abstract:
We introduce a simple yet efficient sentence-level attack on black-box toxicity detector models. By adding several positive words or sentences to the end of a hateful message, we are able to change the prediction of a neural network and pass the toxicity detection system check. This approach is shown to be working on seven languages from three different language families. We also describe the defe…
▽ More
We introduce a simple yet efficient sentence-level attack on black-box toxicity detector models. By adding several positive words or sentences to the end of a hateful message, we are able to change the prediction of a neural network and pass the toxicity detection system check. This approach is shown to be working on seven languages from three different language families. We also describe the defence mechanism against the aforementioned attack and discuss its limitations.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
On the definition of toxicity in NLP
Authors:
Sergey Berezin,
Reza Farahbakhsh,
Noel Crespi
Abstract:
The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out.
This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describ…
▽ More
The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out.
This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.
△ Less
Submitted 19 October, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Hate Speech and Offensive Language Detection using an Emotion-aware Shared Encoder
Authors:
Khouloud Mnassri,
Praboda Rajapaksha,
Reza Farahbakhsh,
Noel Crespi
Abstract:
The rise of emergence of social media platforms has fundamentally altered how people communicate, and among the results of these developments is an increase in online use of abusive content. Therefore, automatically detecting this content is essential for banning inappropriate information, and reducing toxicity and violence on social media platforms. The existing works on hate speech and offensive…
▽ More
The rise of emergence of social media platforms has fundamentally altered how people communicate, and among the results of these developments is an increase in online use of abusive content. Therefore, automatically detecting this content is essential for banning inappropriate information, and reducing toxicity and violence on social media platforms. The existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models, however, they considered only the analysis of abusive content features generated through annotated datasets. This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora in dealing with the imbalanced and scarcity of labeled datasets. Our analysis are using two well-known Transformer-based models, BERT and mBERT, where the later is used to address abusive content detection in multi-lingual scenarios. Our model jointly learns abusive content detection with emotional features by sharing representations through transformers' shared encoder. This approach increases data efficiency, reduce overfitting via shared representations, and ensure fast learning by leveraging auxiliary information. Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets. Our hate speech detection Multi-task model exhibited 3% performance improvement over baseline models, but the performance of multi-task models were not significant for offensive language detection task. More interestingly, in both tasks, multi-task models exhibits less false positive errors compared to single task scenario.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
BERT-based Ensemble Approaches for Hate Speech Detection
Authors:
Khouloud Mnassri,
Praboda Rajapaksha,
Reza Farahbakhsh,
Noel Crespi
Abstract:
With the freedom of communication provided in online social media, hate speech has increasingly generated. This leads to cyber conflicts affecting social life at the individual and national levels. As a result, hateful content classification is becoming increasingly demanded for filtering hate content before being sent to the social networks. This paper focuses on classifying hate speech in social…
▽ More
With the freedom of communication provided in online social media, hate speech has increasingly generated. This leads to cyber conflicts affecting social life at the individual and national levels. As a result, hateful content classification is becoming increasingly demanded for filtering hate content before being sent to the social networks. This paper focuses on classifying hate speech in social media using multiple deep models that are implemented by integrating recent transformer-based language models such as BERT, and neural networks. To improve the classification performances, we evaluated with several ensemble techniques, including soft voting, maximum value, hard voting and stacking. We used three publicly available Twitter datasets (Davidson, HatEval2019, OLID) that are generated to identify offensive languages. We fused all these datasets to generate a single dataset (DHO dataset), which is more balanced across different labels, to perform multi-label classification. Our experiments have been held on Davidson dataset and the DHO corpora. The later gave the best overall results, especially F1 macro score, even it required more resources (time execution and memory). The experiments have shown good results especially the ensemble models, where stacking gave F1 score of 97% on Davidson dataset and aggregating ensembles 77% on the DHO dataset.
△ Less
Submitted 15 September, 2022; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Transfer Learning for Multi-lingual Tasks -- a Survey
Authors:
Amir Reza Jafari,
Behnam Heidary,
Reza Farahbakhsh,
Mostafa Salehi,
Mahdi Jalili
Abstract:
These days different platforms such as social media provide their clients from different backgrounds and languages the possibility to connect and exchange information. It is not surprising anymore to see comments from different languages in posts published by international celebrities or data providers. In this era, understanding cross languages content and multilingualism in natural language proc…
▽ More
These days different platforms such as social media provide their clients from different backgrounds and languages the possibility to connect and exchange information. It is not surprising anymore to see comments from different languages in posts published by international celebrities or data providers. In this era, understanding cross languages content and multilingualism in natural language processing (NLP) are hot topics, and multiple efforts have tried to leverage existing technologies in NLP to tackle this challenging research problem. In this survey, we provide a comprehensive overview of the existing literature with a focus on transfer learning techniques in multilingual tasks. We also identify potential opportunities for further research in this domain.
△ Less
Submitted 28 August, 2021;
originally announced October 2021.
-
Characterising and Detecting Sponsored Influencer Posts on Instagram
Authors:
Koosha Zarei,
Damilola Ibosiola,
Reza Farahbakhsh,
Zafar Gilani,
Kiran Garimella,
Noel Crespi,
Gareth Tyson
Abstract:
Recent years have seen a new form of advertisement campaigns emerge: those involving so-called social media influencers. These influencers accept money in return for promoting products via their social media feeds. Although this constitutes a new and interesting form of marketing, it also raises many questions, particularly related to transparency and regulation. For example, it can sometimes be u…
▽ More
Recent years have seen a new form of advertisement campaigns emerge: those involving so-called social media influencers. These influencers accept money in return for promoting products via their social media feeds. Although this constitutes a new and interesting form of marketing, it also raises many questions, particularly related to transparency and regulation. For example, it can sometimes be unclear which accounts are officially influencers, or what even constitutes an influencer/advert. This is important in order to establish the integrity of influencers and to ensure compliance with advertisement regulation. We gather a large-scale Instagram dataset covering thousands of accounts advertising products, and create a categorisation based on the number of users they reach. We then provide a detailed analysis of the types of products being advertised by these accounts, their potential reach, and the engagement they receive from their followers. Based on our findings, we train machine learning models to distinguish sponsored content from non-sponsored, and identify cases where people are generating sponsored posts without officially labelling them. Our findings provide a first step towards understanding the under-studied space of online influencers that could be useful for researchers, marketers and policymakers.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Impersonation on Social Media: A Deep Neural Approach to Identify Ingenuine Content
Authors:
Koosha Zarei,
Reza Farahbakhsh,
Noel Crespi,
Gareth Tyson
Abstract:
Impersonators are playing an important role in the production and propagation of the content on Online Social Networks, notably on Instagram. These entities are nefarious fake accounts that intend to disguise a legitimate account by making similar profiles and then striking social media by fake content, which makes it considerably harder to understand which posts are genuinely produced. In this st…
▽ More
Impersonators are playing an important role in the production and propagation of the content on Online Social Networks, notably on Instagram. These entities are nefarious fake accounts that intend to disguise a legitimate account by making similar profiles and then striking social media by fake content, which makes it considerably harder to understand which posts are genuinely produced. In this study, we focus on three important communities with legitimate verified accounts. Among them, we identify a collection of 2.2K impersonator profiles with nearly 10k generated posts, 68K comments, and 90K likes. Then, based on profile characteristics and user behaviours, we cluster them into two collections of `bot' and `fan'. In order to separate the impersonator-generated post from genuine content, we propose a Deep Neural Network architecture that measures `profiles' and `posts' features to predict the content type: `bot-generated', 'fan-generated', or `genuine' content. Our study shed light into this interesting phenomena and provides interesting observation on bot-generated content that can help us to understand the role of impersonators in the production of fake content on Instagram.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model
Authors:
Marzieh Mozafari,
Reza Farahbakhsh,
Noel Crespi
Abstract:
Disparate biases associated with datasets and trained classifiers in hateful and abusive content identification tasks have raised many concerns recently. Although the problem of biased datasets on abusive language detection has been addressed more frequently, biases arising from trained classifiers have not yet been a matter of concern. Here, we first introduce a transfer learning approach for hat…
▽ More
Disparate biases associated with datasets and trained classifiers in hateful and abusive content identification tasks have raised many concerns recently. Although the problem of biased datasets on abusive language detection has been addressed more frequently, biases arising from trained classifiers have not yet been a matter of concern. Here, we first introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT and evaluate the proposed model on two publicly available datasets annotated for racism, sexism, hate or offensive content on Twitter. Next, we introduce a bias alleviation mechanism in hate speech detection task to mitigate the effect of bias in training set during the fine-tuning of our pre-trained BERT-based model. Toward that end, we use an existing regularization method to reweight input samples, thereby decreasing the effects of high correlated training set' s n-grams with class labels, and then fine-tune our pre-trained BERT-based model with the new re-weighted samples. To evaluate our bias alleviation mechanism, we employ a cross-domain approach in which we use the trained classifiers on the aforementioned datasets to predict the labels of two new datasets from Twitter, AAE-aligned and White-aligned groups, which indicate tweets written in African-American English (AAE) and Standard American English (SAE) respectively. The results show the existence of systematic racial bias in trained classifiers as they tend to assign tweets written in AAE from AAE-aligned group to negative classes such as racism, sexism, hate, and offensive more often than tweets written in SAE from White-aligned. However, the racial bias in our classifiers reduces significantly after our bias alleviation mechanism is incorporated. This work could institute the first step towards debiasing hate speech and abusive language detection systems.
△ Less
Submitted 28 August, 2020; v1 submitted 14 August, 2020;
originally announced August 2020.
-
A First Instagram Dataset on COVID-19
Authors:
Koosha Zarei,
Reza Farahbakhsh,
Noel Crespi,
Gareth Tyson
Abstract:
The novel coronavirus (COVID-19) pandemic outbreak is drastically shaping and reshaping many aspects of our life, with a huge impact on our social life. In this era of lockdown policies in most of the major cities around the world, we see a huge increase in people and professional engagement in social media. Social media is playing an important role in news propagation as well as keeping people in…
▽ More
The novel coronavirus (COVID-19) pandemic outbreak is drastically shaping and reshaping many aspects of our life, with a huge impact on our social life. In this era of lockdown policies in most of the major cities around the world, we see a huge increase in people and professional engagement in social media. Social media is playing an important role in news propagation as well as keeping people in contact. At the same time, this source is both a blessing and a curse as the coronavirus infodemic has become a major concern, and is already a topic that needs special attention and further research. In this paper, we provide a multilingual coronavirus (COVID-19) Instagram dataset that we have been continuously collected since March 30, 2020. We are making our dataset available to the research community at Github. We believe that this contribution will help the community to better understand the dynamics behind this phenomenon in Instagram, as one of the major social media. This dataset could also help study the propagation of misinformation related to this outbreak.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
How Impersonators Exploit Instagram to Generate Fake Engagement?
Authors:
Koosha Zarei,
Reza Farahbakhsh,
Noel Crespi
Abstract:
Impersonators on Online Social Networks such as Instagram are playing an important role in the propagation of the content. These entities are the type of nefarious fake accounts that intend to disguise a legitimate account by making similar profiles. In addition to having impersonated profiles, we observed a considerable engagement from these entities to the published posts of verified accounts. T…
▽ More
Impersonators on Online Social Networks such as Instagram are playing an important role in the propagation of the content. These entities are the type of nefarious fake accounts that intend to disguise a legitimate account by making similar profiles. In addition to having impersonated profiles, we observed a considerable engagement from these entities to the published posts of verified accounts. Toward that end, we concentrate on the engagement of impersonators in terms of active and passive engagements which is studied in three major communities including ``Politician'', ``News agency'', and ``Sports star'' on Instagram. Inside each community, four verified accounts have been selected. Based on the implemented approach in our previous studies, we have collected 4.8K comments, and 2.6K likes across 566 posts created from 3.8K impersonators during 7 months. Our study shed light into this interesting phenomena and provides a surprising observation that can help us to understand better how impersonators engaging themselves inside Instagram in terms of writing Comments and leaving Likes.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media
Authors:
Marzieh Mozafari,
Reza Farahbakhsh,
Noel Crespi
Abstract:
Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an efficient automatic hate speech detection model based on advanced machine learning and natural language processing, but also a sufficiently large amount of anno…
▽ More
Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an efficient automatic hate speech detection model based on advanced machine learning and natural language processing, but also a sufficiently large amount of annotated data to train a model. The lack of a sufficient amount of labelled hate speech data, along with the existing biases, has been the main issue in this domain of research. To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers). More specifically, we investigate the ability of BERT at capturing hateful context within social media content by using new fine-tuning methods based on transfer learning. To evaluate our proposed approach, we use two publicly available datasets that have been annotated for racism, sexism, hate, or offensive content on Twitter. The results show that our solution obtains considerable performance on these datasets in terms of precision and recall in comparison to existing approaches. Consequently, our model can capture some biases in data annotation and collection process and can potentially lead us to a more accurate model.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
Uncovering Flaming Events on News Media in Social Media
Authors:
Praboda Rajapaksha,
Reza Farahbakhsh,
Noel Crespi,
Bruno Defude
Abstract:
Social networking sites (SNSs) facilitate the sharing of ideas and information through different types of feedback including publishing posts, leaving comments and other type of reactions. However, some comments or feedback on SNSs are inconsiderate and offensive, and sometimes this type of feedback has a very negative effect on a target user. The phenomenon known as flaming goes hand-in-hand with…
▽ More
Social networking sites (SNSs) facilitate the sharing of ideas and information through different types of feedback including publishing posts, leaving comments and other type of reactions. However, some comments or feedback on SNSs are inconsiderate and offensive, and sometimes this type of feedback has a very negative effect on a target user. The phenomenon known as flaming goes hand-in-hand with this type of posting that can trigger almost instantly on SNSs. Most popular users such as celebrities, politicians and news media are the major victims of the flaming behaviors and so detecting these types of events will be useful and appreciated. Flaming event can be monitored and identified by analyzing negative comments received on a post. Thus, our main objective of this study is to identify a way to detect flaming events in SNS using a sentiment prediction method. We use a deep Neural Network (NN) model that can identity sentiments of variable length sentences and classifies the sentiment of SNSs content (both comments and posts) to discover flaming events. Our deep NN model uses Word2Vec and FastText word embedding methods as its training to explore which method is the most appropriate. The labeled dataset for training the deep NN is generated using an enhanced lexicon based approach. Our deep NN model classifies the sentiment of a sentence into five classes: Very Positive, Positive, Neutral, Negative and Very Negative. To detect flaming incidents, we focus only on the comments classified into the Negative and Very Negative classes. As a use-case, we try to explore the flaming phenomena in the news media domain and therefore we focused on news items posted by three popular news media on Facebook (BBCNews, CNN and FoxNews) to train and test the model.
△ Less
Submitted 16 September, 2019;
originally announced September 2019.
-
Inspecting Interactions: Online News Media Synergies in Social Media
Authors:
Praboda Rajapaksha,
Reza Farahbakhsh,
Noel Crespi,
Bruno Defude
Abstract:
The rising popularity of social media has radically changed the way news content is propagated, including interactive attempts with new dimensions. To date, traditional news media such as newspapers, television and radio have already adapted their activities to the online news media by utilizing social media, blogs, websites etc. This paper provides some insight into the social media presence of w…
▽ More
The rising popularity of social media has radically changed the way news content is propagated, including interactive attempts with new dimensions. To date, traditional news media such as newspapers, television and radio have already adapted their activities to the online news media by utilizing social media, blogs, websites etc. This paper provides some insight into the social media presence of worldwide popular news media outlets. Despite the fact that these large news media propagate content via social media environments to a large extent and very little is known about the news item producers, providers and consumers in the news media community in social media.To better understand these interactions, this work aims to analyze news items in two large social media, Twitter and Facebook. Towards that end, we collected all published posts on Twitter and Facebook from 48 news media to perform descriptive and predictive analyses using the dataset of 152K tweets and 80K Facebook posts. We explored a set of news media that originate content by themselves in social media, those who distribute their news items to other news media and those who consume news content from other news media and/or share replicas. We propose a predictive model to increase news media popularity among readers based on the number of posts, number of followers and number of interactions performed within the news media community. The results manifested that, news media should disperse their own content and they should publish first in social media in order to become a popular news media and receive more attractions to their news items from news readers.
△ Less
Submitted 16 September, 2018;
originally announced September 2018.
-
Popularity Evolution of Professional Users on Facebook
Authors:
Samin Mohammadi,
Reza Farahbakhsh,
Noel Crespi
Abstract:
Popularity in social media is an important objective for professional users (e.g. companies, celebrities, and public figures, etc). A simple yet prominent metric utilized to measure the popularity of a user is the number of fans or followers she succeed to attract to her page. Popularity is influenced by several factors which identifying them is an interesting research topic. This paper aims to un…
▽ More
Popularity in social media is an important objective for professional users (e.g. companies, celebrities, and public figures, etc). A simple yet prominent metric utilized to measure the popularity of a user is the number of fans or followers she succeed to attract to her page. Popularity is influenced by several factors which identifying them is an interesting research topic. This paper aims to understand this phenomenon in social media by exploring the popularity evolution for professional users in Facebook. To this end, we implemented a crawler and monitor the popularity evolution trend of 8k most popular professional users on Facebook over a period of 14 months. The collected dataset includes around 20 million popularity values and 43 million posts. We characterized different popularity evolution patterns by clustering the users temporal number of fans and study them from various perspectives including their categories and level of activities. Our observations show that being active and famous correlate positively with the popularity trend.
△ Less
Submitted 5 May, 2017;
originally announced May 2017.
-
How far is Facebook from me? Facebook network infrastructure analysis
Authors:
Reza Farahbakhsh,
Angel Cuevas,
Antonio M. Ortiz,
Xiao Han,
Noel Crespi
Abstract:
Facebook is today the most popular social network with more than one billion subscribers worldwide. To provide good quality of service (e.g., low access delay) to their clients, FB relies on Akamai, which provides a worldwide content distribution network with a large number of edge servers that are much closer to FB subscribers. In this article we aim to depict a global picture of the current FB n…
▽ More
Facebook is today the most popular social network with more than one billion subscribers worldwide. To provide good quality of service (e.g., low access delay) to their clients, FB relies on Akamai, which provides a worldwide content distribution network with a large number of edge servers that are much closer to FB subscribers. In this article we aim to depict a global picture of the current FB network infrastructure deployment taking into account both native FB servers and Akamai nodes. Toward this end, we have performed a measurement-based analysis during a period of two weeks using 463 Planet- Lab nodes distributed across 41 countries. Based on the obtained data we compare the average access delay that nodes in different countries experience accessing both native FB servers and Akamai nodes. In addition, we obtain a wide view of the deployment of Akamai nodes serving FB users worldwide. Finally, we analyze the geographical coverage of those nodes, and demonstrate that in most of the cases Akamai nodes located in a particular country service not only local FB subscribers, but also FB users located in nearby countries.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Characterization of Cross-posting Activity for Professional Users across Facebook, Twitter and Google+
Authors:
Reza Farahbakhsh,
Angel Cuevas,
Noel Crespi
Abstract:
Professional players in social media (e.g., big companies, politician, athletes, celebrities, etc) are intensively using Online Social Networks (OSNs) in order to interact with a huge amount of regular OSN users with different purposes (marketing campaigns, customer feedback, public reputation improvement, etc). Hence, due to the large catalog of existing OSNs, professional players usually count w…
▽ More
Professional players in social media (e.g., big companies, politician, athletes, celebrities, etc) are intensively using Online Social Networks (OSNs) in order to interact with a huge amount of regular OSN users with different purposes (marketing campaigns, customer feedback, public reputation improvement, etc). Hence, due to the large catalog of existing OSNs, professional players usually count with OSN accounts in different systems. In this context an interesting question is whether professional users publish the same information across their OSN accounts, or actually they use different OSNs in a different manner. We define as cross-posting activity the action of publishing the same information in two or more OSNs. This paper aims at characterizing the cross-posting activity of professional users across three major OSNs, Facebook, Twitter and Google+. To this end, we perform a large-scale measurement-based analysis across more than 2M posts collected from 616 professional users with active accounts in the three referred OSNs. Then we characterize the phenomenon of cross posting and analyze the behavioral patterns based on the identified characteristics.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Understanding the evolution of multimedia content in the Internet through BitTorrent glasses
Authors:
Reza Farahbakhsh,
Angel Cuevas,
Ruben Cuevas,
Roberto Gonzalez,
Noel Crespi
Abstract:
Today's Internet traffic is mostly dominated by multimedia content and the prediction is that this trend will intensify in the future. Therefore, main Internet players, such as ISPs, content delivery platforms (e.g. Youtube, Bitorrent, Netflix, etc) or CDN operators, need to understand the evolution of multimedia content availability and popularity in order to adapt their infrastructures and resou…
▽ More
Today's Internet traffic is mostly dominated by multimedia content and the prediction is that this trend will intensify in the future. Therefore, main Internet players, such as ISPs, content delivery platforms (e.g. Youtube, Bitorrent, Netflix, etc) or CDN operators, need to understand the evolution of multimedia content availability and popularity in order to adapt their infrastructures and resources to satisfy clients requirements while they minimize their costs. This paper presents a thorough analysis on the evolution of multimedia content available in BitTorrent. Specifically, we analyze the evolution of four relevant metrics across different content categories: content availability, content popularity, content size and user's feedback. To this end we leverage a large-scale dataset formed by 4 snapshots collected from the most popular BitTorrent portal, namely The Pirate Bay, between Nov. 2009 and Feb. 2012. Overall our dataset is formed by more than 160k content that attracted more than 185M of download sessions.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Analysis of publicly disclosed information in Facebook profiles
Authors:
Reza Farahbakhsh,
Xiao Han,
Angel Cuevas,
Noel Crespi
Abstract:
Facebook, the most popular Online social network is a virtual environment where users share information and are in contact with friends. Apart from many useful aspects, there is a large amount of personal and sensitive information publicly available that is accessible to external entities/users. In this paper we study the public exposure of Facebook profile attributes to understand what type of at…
▽ More
Facebook, the most popular Online social network is a virtual environment where users share information and are in contact with friends. Apart from many useful aspects, there is a large amount of personal and sensitive information publicly available that is accessible to external entities/users. In this paper we study the public exposure of Facebook profile attributes to understand what type of attributes are considered more sensitive by Facebook users in terms of privacy, and thus are rarely disclosed, and which attributes are available in most Facebook profiles. Furthermore, we also analyze the public exposure of Facebook users by accounting the number of attributes that users make publicly available on average. To complete our analysis we have crawled the profile information of 479K randomly selected Facebook users. Finally, in order to demonstrate the utility of the publicly available information in Facebook profiles we show in this paper three case studies. The first one carries out a gender-based analysis to understand whether men or women share more or less information. The second case study depicts the age distribution of Facebook users. The last case study uses data inferred from Facebook profiles to map the distribution of worldwide population across cities according to its size.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Middleware Technologies for Cloud of Things - a survey
Authors:
Amirhossein Farahzadia,
Pooyan Shams,
Javad Rezazadeh,
Reza Farahbakhsh
Abstract:
The next wave of communication and applications rely on the new services provided by Internet of Things which is becoming an important aspect in human and machines future. The IoT services are a key solution for providing smart environments in homes, buildings and cities. In the era of a massive number of connected things and objects with a high grow rate, several challenges have been raised such…
▽ More
The next wave of communication and applications rely on the new services provided by Internet of Things which is becoming an important aspect in human and machines future. The IoT services are a key solution for providing smart environments in homes, buildings and cities. In the era of a massive number of connected things and objects with a high grow rate, several challenges have been raised such as management, aggregation and storage for big produced data. In order to tackle some of these issues, cloud computing emerged to IoT as Cloud of Things (CoT) which provides virtually unlimited cloud services to enhance the large scale IoT platforms. There are several factors to be considered in design and implementation of a CoT platform. One of the most important and challenging problems is the heterogeneity of different objects. This problem can be addressed by deploying suitable "Middleware". Middleware sits between things and applications that make a reliable platform for communication among things with different interfaces, operating systems, and architectures. The main aim of this paper is to study the middleware technologies for CoT. Toward this end, we first present the main features and characteristics of middlewares. Next we study different architecture styles and service domains. Then we presents several middlewares that are suitable for CoT based platforms and lastly a list of current challenges and issues in design of CoT based middlewares is discussed.
△ Less
Submitted 30 April, 2017;
originally announced May 2017.
-
An in-depth characterisation of Bots and Humans on Twitter
Authors:
Zafar Gilani,
Reza Farahbakhsh,
Gareth Tyson,
Liang Wang,
Jon Crowcroft
Abstract:
Recent research has shown a substantial active presence of bots in online social networks (OSNs). In this paper we utilise our past work on studying bots (Stweeler) to comparatively analyse the usage and impact of bots and humans on Twitter, one of the largest OSNs in the world. We collect a large-scale Twitter dataset and define various metrics based on tweet metadata. We divide and filter the da…
▽ More
Recent research has shown a substantial active presence of bots in online social networks (OSNs). In this paper we utilise our past work on studying bots (Stweeler) to comparatively analyse the usage and impact of bots and humans on Twitter, one of the largest OSNs in the world. We collect a large-scale Twitter dataset and define various metrics based on tweet metadata. We divide and filter the dataset in four popularity groups in terms of number of followers. Using a human annotation task we assign 'bot' and 'human' ground-truth labels to the dataset, and compare the annotations against an online bot detection tool for evaluation. We then ask a series of questions to discern important behavioural bot and human characteristics using metrics within and among four popularity groups. From the comparative analysis we draw important differences as well as surprising similarities between the two entities, thus paving the way for reliable classification of automated political infiltration, advertisement campaigns, and general bot detection.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media
Authors:
Saeedreza Shehnepoor,
Mostafa Salehi,
Reza Farahbakhsh,
Noel Crespi
Abstract:
Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a co…
▽ More
Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, reviewlinguistic, user-linguistic, the first type of features performs better than the other categories.
△ Less
Submitted 10 March, 2017;
originally announced March 2017.
-
A Trust Model for Data Sharing in Smart Cities
Authors:
Quyet H. Cao,
Imran Khan,
Reza Farahbakhsh,
Giyyarpuram Madhusudan,
Gyu Myoung Lee,
Noel Crespi
Abstract:
The data generated by the devices and existing infrastructure in the Internet of Things (IoT) should be shared among applications. However, data sharing in the IoT can only reach its full potential when multiple participants contribute their data, for example when people are able to use their smartphone sensors for this purpose. We believe that each step, from sensing the data to the actionable kn…
▽ More
The data generated by the devices and existing infrastructure in the Internet of Things (IoT) should be shared among applications. However, data sharing in the IoT can only reach its full potential when multiple participants contribute their data, for example when people are able to use their smartphone sensors for this purpose. We believe that each step, from sensing the data to the actionable knowledge, requires trust-enabled mechanisms to facilitate data exchange, such as data perception trust, trustworthy data mining, and reasoning with trust related policies. The absence of trust could affect the acceptance of sharing data in smart cities. In this study, we focus on data usage transparency and accountability and propose a trust model for data sharing in smart cities, including system architecture for trust-based data sharing, data semantic and abstraction models, and a mechanism to enhance transparency and accountability for data usage. We apply semantic technology and defeasible reasoning with trust data usage policies. We built a prototype based on an air pollution monitoring use case and utilized it to evaluate the performance of our solution.
△ Less
Submitted 24 March, 2016;
originally announced March 2016.
-
Seamless Handover for IMS over Mobile-IPv6 Using Context Transfer
Authors:
Reza Farahbakhsh,
Naser Movahhedinia
Abstract:
Mobility support for the next generation IPv6 networks has been one of the recent research issues due to the growing demand for wireless services over internet. In the other hand, 3GPP has introduced IP Multimedia Subsystem as the next generation IP based infrastructure for wireless and wired multimedia services. In this paper we present two context transfer mechanisms based on predictive and reac…
▽ More
Mobility support for the next generation IPv6 networks has been one of the recent research issues due to the growing demand for wireless services over internet. In the other hand, 3GPP has introduced IP Multimedia Subsystem as the next generation IP based infrastructure for wireless and wired multimedia services. In this paper we present two context transfer mechanisms based on predictive and reactive schemes, to support seamless handover in IMS over Mobile IPv6. Those schemes reduce handover latency by transferring appropriate session information between the old and the new access networks. Moreover, we present two methods for QoS parameters negotiations to preserve service quality along the mobile user movement path. The performances of the proposed mechanisms are evaluated by simulations.
△ Less
Submitted 6 August, 2012;
originally announced August 2012.