Search | arXiv e-print repository

doi 10.36190/2023.06

Unveiling the Dynamics of Censorship, COVID-19 Regulations, and Protest: An Empirical Study of Chinese Subreddit r/china_irl

Authors: Siyi Zhou, Luca Luceri, Emilio Ferrara

Abstract: The COVID-19 pandemic has intensified numerous social issues that warrant academic investigation. Although information dissemination has been extensively studied, the silenced voices and censored content also merit attention due to their role in mobilizing social movements. In this paper, we provide empirical evidence to explore the relationships among COVID-19 regulations, censorship, and protest… ▽ More The COVID-19 pandemic has intensified numerous social issues that warrant academic investigation. Although information dissemination has been extensively studied, the silenced voices and censored content also merit attention due to their role in mobilizing social movements. In this paper, we provide empirical evidence to explore the relationships among COVID-19 regulations, censorship, and protest through a series of social incidents occurred in China during 2022. We analyze the similarities and differences between censored articles and discussions on r/china\_irl, the most popular Chinese-speaking subreddit, and scrutinize the temporal dynamics of government censorship activities and their impact on user engagement within the subreddit. Furthermore, we examine users' linguistic patterns under the influence of a censorship-driven environment. Our findings reveal patterns in topic recurrence, the complex interplay between censorship activities, user subscription, and collective commenting behavior, as well as potential linguistic adaptation strategies to circumvent censorship. These insights hold significant implications for researchers interested in understanding the survival mechanisms of marginalized groups within censored information ecosystems. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2207.08349 [pdf, other]

Retweet-BERT: Political Leaning Detection Using Language Features and Information Diffusion on Social Networks

Authors: Julie Jiang, Xiang Ren, Emilio Ferrara

Abstract: Estimating the political leanings of social media users is a challenging and ever more pressing problem given the increase in social media consumption. We introduce Retweet-BERT, a simple and scalable model to estimate the political leanings of Twitter users. Retweet-BERT leverages the retweet network structure and the language used in users' profile descriptions. Our assumptions stem from pattern… ▽ More Estimating the political leanings of social media users is a challenging and ever more pressing problem given the increase in social media consumption. We introduce Retweet-BERT, a simple and scalable model to estimate the political leanings of Twitter users. Retweet-BERT leverages the retweet network structure and the language used in users' profile descriptions. Our assumptions stem from patterns of networks and linguistics homophily among people who share similar ideologies. Retweet-BERT demonstrates competitive performance against other state-of-the-art baselines, achieving 96%-97% macro-F1 on two recent Twitter datasets (a COVID-19 dataset and a 2020 United States presidential elections dataset). We also perform manual validation to validate the performance of Retweet-BERT on users not in the training data. Finally, in a case study of COVID-19, we illustrate the presence of political echo chambers on Twitter and show that it exists primarily among right-leaning users. Our code is open-sourced and our data is publicly available. △ Less

Submitted 6 April, 2023; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: 11 pages, 3 figures, 4 tables. arXiv admin note: text overlap with arXiv:2103.10979

Journal ref: The 17th International AAAI Conference on Web and Social Media (ICWSM 2023)

arXiv:2111.09361 [pdf, other]

doi 10.3847/1538-4357/ac5829

Bayesian Solar Wind Modeling with Pulsar Timing Arrays

Authors: Jeffrey S. Hazboun, Joseph Simon, Dustin R. Madison, Zaven Arzoumanian, Kathryn Crowter, Megan E. DeCesar, Paul B. Demorest, Timothy Dolch, Justin A. Ellis, Robert D. Ferdman, Elizabeth C. Ferrara, Emmanuel Fonseca, Peter A. Gentile, Glenn Jones, Megan L. Jones, Michael T. Lam, Lina Levin, Duncan R. Lorimer, Ryan S. Lynch, Maura A. McLaughlin, Cherry Ng, David J. Nice, Timothy T. Pennucci, Scott M. Ransom, Paul S. Ray , et al. (5 additional authors not shown)

Abstract: Using Bayesian analyses we study the solar electron density with the NANOGrav 11-year pulsar timing array (PTA) dataset. Our model of the solar wind is incorporated into a global fit starting from pulse times-of-arrival. We introduce new tools developed for this global fit, including analytic expressions for solar electron column densities and open source models for the solar wind that port into e… ▽ More Using Bayesian analyses we study the solar electron density with the NANOGrav 11-year pulsar timing array (PTA) dataset. Our model of the solar wind is incorporated into a global fit starting from pulse times-of-arrival. We introduce new tools developed for this global fit, including analytic expressions for solar electron column densities and open source models for the solar wind that port into existing PTA software. We perform an ab initio recovery of various solar wind model parameters. We then demonstrate the richness of information about the solar electron density, $n_E$, that can be gleaned from PTA data, including higher order corrections to the simple $1/r^2$ model associated with a free-streaming wind (which are informative probes of coronal acceleration physics), quarterly binned measurements of $n_E$ and a continuous time-varying model for $n_E$ spanning approximately one solar cycle period. Finally, we discuss the importance of our model for chromatic noise mitigation in gravitational-wave analyses of pulsar timing data and the potential of developing synergies between sophisticated PTA solar electron density models and those developed by the solar physics community. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: 22 pages, 7 figures, Submitted to ApJ

arXiv:2006.06142 [pdf]

doi 10.2196/25379

Gender disparity in the authorship of biomedical research publications during the COVID-19 pandemic

Authors: Goran Muric, Kristina Lerman, Emilio Ferrara

Abstract: Preliminary evidence suggests that women, including female researchers, are disproportionately affected by the COVID-19 pandemic in terms of unequal distribution of childcare, elderly care and other kinds of domestic and emotional labor. Sudden lockdowns and abrupt shifts in daily routines have disproportionate consequences on their productivity, which is reflected by a sudden drop in research out… ▽ More Preliminary evidence suggests that women, including female researchers, are disproportionately affected by the COVID-19 pandemic in terms of unequal distribution of childcare, elderly care and other kinds of domestic and emotional labor. Sudden lockdowns and abrupt shifts in daily routines have disproportionate consequences on their productivity, which is reflected by a sudden drop in research output in biomedical research, consequently affecting the number of female authors of scientific publications. We investigate the proportion of male and female researchers who published scientific papers during the COVID-19 pandemic, using bibliometric data from biomedical preprint servers and selected Springer-Nature journals. Our findings document a decrease in the number of publications by female authors in biomedical field during the global pandemic. This effect is particularly pronounced for papers related to COVID-19, indicating that women are producing fewer publications related to COVID-19 research. This sudden increase in the gender gap is persistent across the ten countries with the highest number of researchers. These results should be used to inform the scientific community of the worrying trend in COVID-19 research and the disproportionate effect that the pandemic has on female academics. △ Less

Submitted 24 March, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

arXiv:2004.09531 [pdf]

doi 10.5210/fm.v25i6.10633

What Types of COVID-19 Conspiracies are Populated by Twitter Bots?

Authors: Emilio Ferrara

Abstract: With people moving out of physical public spaces due to containment measures to tackle the novel coronavirus (COVID-19) pandemic, online platforms become even more prominent tools to understand social discussion. Studying social media can be informative to assess how we are collectively coping with this unprecedented global crisis. However, social media platforms are also populated by bots, automa… ▽ More With people moving out of physical public spaces due to containment measures to tackle the novel coronavirus (COVID-19) pandemic, online platforms become even more prominent tools to understand social discussion. Studying social media can be informative to assess how we are collectively coping with this unprecedented global crisis. However, social media platforms are also populated by bots, automated accounts that can amplify certain topics of discussion at the expense of others. In this paper, we study 43.3M English tweets about COVID-19 and provide early evidence of the use of bots to promote political conspiracies in the United States, in stark contrast with humans who focus on public health concerns. △ Less

Submitted 2 June, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

Comments: Published in: First Monday, 25(6), 2020; https://firstmonday.org/ojs/index.php/fm/article/view/10633

Journal ref: First Monday, 25(6), 2020

arXiv:1910.05870 [pdf, other]

doi 10.1103/PhysRevE.102.052316

Network Modularity Controls the Speed of Information Diffusion

Authors: Hao Peng, Azadeh Nematzadeh, Daniel M. Romero, Emilio Ferrara

Abstract: The rapid diffusion of information and the adoption of social behaviors are of critical importance in situations as diverse as collective actions, pandemic prevention, or advertising and marketing. Although the dynamics of large cascades have been extensively studied in various contexts, few have systematically examined the impact of network topology on the efficiency of information diffusion. Her… ▽ More The rapid diffusion of information and the adoption of social behaviors are of critical importance in situations as diverse as collective actions, pandemic prevention, or advertising and marketing. Although the dynamics of large cascades have been extensively studied in various contexts, few have systematically examined the impact of network topology on the efficiency of information diffusion. Here, by employing the linear threshold model on networks with communities, we demonstrate that a prominent network feature---the modular structure---strongly affects the speed of information diffusion in complex contagion. Our simulations show that there always exists an optimal network modularity for the most efficient spreading process. Beyond this critical value, either a stronger or a weaker modular structure actually hinders the diffusion speed. These results are confirmed by an analytical approximation. We further demonstrate that the optimal modularity varies with both the seed size and the target cascade size, and is ultimately dependent on the network under investigation. We underscore the importance of our findings in applications from marketing to epidemiology, from neuroscience to engineering, where the understanding of the structural design of complex systems focuses on the efficiency of information propagation. △ Less

Submitted 30 July, 2020; v1 submitted 13 October, 2019; originally announced October 2019.

arXiv:1906.07641 [pdf, other]

doi 10.1016/j.diamond.2019.107489

Multi-analytical characterization of Fe-rich magnetic inclusions in diamonds

Authors: Marco Piazzi, Marta Morana, Marco Coïsson, Federica Marone, Marcello Campione, Luca Bindi, Adrian P. Jones, Enzo Ferrara, Matteo Alvaro

Abstract: Magnetic mineral inclusions, as iron oxides or sulfides, occur quite rarely in natural diamonds. Nonetheless, they represent a key tool not only to unveil the conditions of formation of host diamonds, but also to get hints about the paleointensity of the geomagnetic field present at times of the Earth's history otherwise not accessible. This possibility is related to their capability to carry a re… ▽ More Magnetic mineral inclusions, as iron oxides or sulfides, occur quite rarely in natural diamonds. Nonetheless, they represent a key tool not only to unveil the conditions of formation of host diamonds, but also to get hints about the paleointensity of the geomagnetic field present at times of the Earth's history otherwise not accessible. This possibility is related to their capability to carry a remanent magnetization dependent on their magnetic history. However, comprehensive experimental studies on magnetic inclusions in diamonds have been rarely reported so far. Here we exploit X-ray diffraction, Synchrotron-based X-ray Tomographic Microscopy and Alternating Field Magnetometry to determine the crystallographic, morphological and magnetic properties of ferrimagnetic Fe-oxides entrapped in diamonds coming from Akwatia (Ghana). We exploit the methodology to estimate the natural remanence of the inclusions, associated to the Earth's magnetic field they experienced, and to get insights on the relative time of formation between host and inclusion systems. Furthermore, from the hysteresis loops and First Order Reversal Curves we determine qualitatively the anisotropy, size and domain state configuration of the magnetic grains constituting the inclusions. △ Less

Submitted 2 August, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: v1: PDFLaTeX,12 pages,9 figs,1 table; v2: PDFLaTeX,10 pages,8 figs,1 table. All sections reorganized and shortened; some references deleted; Introduction and Discussion sections clarified; Figs. 4,5 of version v1 merged in new-Fig. 4; typos corrected; v3: PDFLaTeX,10 pages,8 figs,1 table. Slight changes in Introduction; few typos corrected. Accepted for publication in Diamond and Related Materials

Journal ref: Diam. Relat. Mat. 98, 107489 (2019)

arXiv:1808.03281 [pdf, other]

Who Falls for Online Political Manipulation?

Authors: Adam Badawy, Kristina Lerman, Emilio Ferrara

Abstract: Social media, once hailed as a vehicle for democratization and the promotion of positive social change across the globe, are under attack for becoming a tool of political manipulation and spread of disinformation. A case in point is the alleged use of trolls by Russia to spread malicious content in Western elections. This paper examines the Russian interference campaign in the 2016 US presidential… ▽ More Social media, once hailed as a vehicle for democratization and the promotion of positive social change across the globe, are under attack for becoming a tool of political manipulation and spread of disinformation. A case in point is the alleged use of trolls by Russia to spread malicious content in Western elections. This paper examines the Russian interference campaign in the 2016 US presidential election on Twitter. Our aim is twofold: first, we test whether predicting users who spread trolls' content is feasible in order to gain insight on how to contain their influence in the future; second, we identify features that are most predictive of users who either intentionally or unintentionally play a vital role in spreading this malicious content. We collected a dataset with over 43 million elections-related posts shared on Twitter between September 16 and November 9, 2016, by about 5.7 million users. This dataset includes accounts associated with the Russian trolls identified by the US Congress. Proposed models are able to very accurately identify users who spread the trolls' content (average AUC score of 96%, using 10-fold validation). We show that political ideology, bot likelihood scores, and some activity-related account meta data are the most predictive features of whether a user spreads trolls' content or not. △ Less

Submitted 9 August, 2018; originally announced August 2018.

arXiv:1805.03285 [pdf, other]

doi 10.3389/fdata.2019.00014

Deep Neural Networks for Optimal Team Composition

Authors: Anna Sapienza, Palash Goyal, Emilio Ferrara

Abstract: Cooperation is a fundamental social mechanism, whose effects on human performance have been investigated in several environments. Online games are modern-days natural settings in which cooperation strongly affects human behavior. Every day, millions of players connect and play together in team-based games: the patterns of cooperation can either foster or hinder individual skill learning and perfor… ▽ More Cooperation is a fundamental social mechanism, whose effects on human performance have been investigated in several environments. Online games are modern-days natural settings in which cooperation strongly affects human behavior. Every day, millions of players connect and play together in team-based games: the patterns of cooperation can either foster or hinder individual skill learning and performance. This work has three goals: (i) identifying teammates' influence on players' performance in the short and long term, (ii) designing a computational framework to recommend teammates to improve players' performance, and (iii) setting to demonstrate that such improvements can be predicted via deep learning. We leverage a large dataset from Dota 2, a popular Multiplayer Online Battle Arena game. We generate a directed co-play network, whose links' weights depict the effect of teammates on players' performance. Specifically, we propose a measure of network influence that captures skill transfer from player to player over time. We then use such framing to design a recommendation system to suggest new teammates based on a modified deep neural autoencoder and we demonstrate its state-of-the-art recommendation performance. We finally provide insights into skill transfer effects: our experimental results demonstrate that such dynamics can be predicted using deep neural networks. △ Less

Submitted 8 May, 2018; originally announced May 2018.

arXiv:1802.07292 [pdf, other]

doi 10.1073/pnas.1803470115

Bots increase exposure to negative and inflammatory content in online social systems

Authors: Massimo Stella, Emilio Ferrara, Manlio De Domenico

Abstract: Societies are complex systems which tend to polarize into sub-groups of individuals with dramatically opposite perspectives. This phenomenon is reflected -- and often amplified -- in online social networks where, however, humans are no more the only players, and co-exist alongside with social bots, i.e., software-controlled accounts. Analyzing large-scale social data collected during the Catalan r… ▽ More Societies are complex systems which tend to polarize into sub-groups of individuals with dramatically opposite perspectives. This phenomenon is reflected -- and often amplified -- in online social networks where, however, humans are no more the only players, and co-exist alongside with social bots, i.e., software-controlled accounts. Analyzing large-scale social data collected during the Catalan referendum for independence on October 1, 2017, consisting of nearly 4 millions Twitter posts generated by almost 1 million users, we identify the two polarized groups of Independentists and Constitutionalists and quantify the structural and emotional roles played by social bots. We show that bots act from peripheral areas of the social system to target influential humans of both groups, bombarding Independentists with violent contents, increasing their exposure to negative and inflammatory narratives and exacerbating social conflict online. Our findings stress the importance of developing countermeasures to unmask these forms of automated social manipulation. △ Less

Submitted 28 February, 2019; v1 submitted 20 February, 2018; originally announced February 2018.

Comments: 8 pages, 5 figures

Journal ref: PNAS 115 (49) 12435-12440 (2018)

arXiv:1801.09783 [pdf, other]

doi 10.1109/ICDMW.2017.124

Performance Dynamics and Success in Online Games

Authors: Anna Sapienza, Hao Peng, Emilio Ferrara

Abstract: Online data provide a way to monitor how users behave in social systems like social networks and online games, and understand which features turn an ordinary individual into a successful one. Here, we propose to study individual performance and success in Multiplayer Online Battle Arena (MOBA) games. Our purpose is to identify those behaviors and playing styles that are characteristic of players w… ▽ More Online data provide a way to monitor how users behave in social systems like social networks and online games, and understand which features turn an ordinary individual into a successful one. Here, we propose to study individual performance and success in Multiplayer Online Battle Arena (MOBA) games. Our purpose is to identify those behaviors and playing styles that are characteristic of players with high skill level and that distinguish them from other players. To this aim, we study Defense of the ancient 2 (Dota 2), a popular MOBA game. Our findings highlight three main aspects to be successful in the game: (i) players need to have a warm-up period to enhance their performance in the game; (ii) having a long in-game experience does not necessarily translate in achieving better skills; but rather, (iii) players that reach high skill levels differentiate from others because of their aggressive playing strategy, which implies to kill opponents more often than cooperating with teammates, and trying to give an early end to the match. △ Less

Submitted 29 January, 2018; originally announced January 2018.

Journal ref: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp:902-909, 2017

arXiv:1708.08134 [pdf, other]

doi 10.1007/978-3-319-77332-2_13

Measuring social spam and the effect of bots on information diffusion in social media

Authors: Emilio Ferrara

Abstract: Bots have been playing a crucial role in online platform ecosystems, as efficient and automatic tools to generate content and diffuse information to the social media human population. In this chapter, we will discuss the role of social bots in content spreading dynamics in social media. In particular, we will first investigate some differences between diffusion dynamics of content generated by bot… ▽ More Bots have been playing a crucial role in online platform ecosystems, as efficient and automatic tools to generate content and diffuse information to the social media human population. In this chapter, we will discuss the role of social bots in content spreading dynamics in social media. In particular, we will first investigate some differences between diffusion dynamics of content generated by bots, as opposed to humans, in the context of political communication, then study the characteristics of bots behind the diffusion dynamics of social media spam campaigns. △ Less

Submitted 27 August, 2017; originally announced August 2017.

Comments: Chapter of the book "Spreading Dynamics in Social Systems" edited by Y.Y. Ahn and Sune Lehmann

Journal ref: In: Lehmann S., Ahn YY. (eds) Complex Spreading Phenomena in Social Systems, pp. 229-255. Springer, 2018

arXiv:1707.00086 [pdf]

doi 10.5210/fm.v22i8.8005

Disinformation and Social Bot Operations in the Run Up to the 2017 French Presidential Election

Authors: Emilio Ferrara

Abstract: Recent accounts from researchers, journalists, as well as federal investigators, reached a unanimous conclusion: social media are systematically exploited to manipulate and alter public opinion. Some disinformation campaigns have been coordinated by means of bots, social media accounts controlled by computer scripts that try to disguise themselves as legitimate human users. In this study, we descr… ▽ More Recent accounts from researchers, journalists, as well as federal investigators, reached a unanimous conclusion: social media are systematically exploited to manipulate and alter public opinion. Some disinformation campaigns have been coordinated by means of bots, social media accounts controlled by computer scripts that try to disguise themselves as legitimate human users. In this study, we describe one such operation occurred in the run up to the 2017 French presidential election. We collected a massive Twitter dataset of nearly 17 million posts occurred between April 27 and May 7, 2017 (Election Day). We then set to study the MacronLeaks disinformation campaign: By leveraging a mix of machine learning and cognitive behavioral modeling techniques, we separated humans from bots, and then studied the activities of the two groups taken independently, as well as their interplay. We provide a characterization of both the bots and the users who engaged with them and oppose it to those users who didn't. Prior interests of disinformation adopters pinpoint to the reasons of the scarce success of this campaign: the users who engaged with MacronLeaks are mostly foreigners with a preexisting interest in alt-right topics and alternative news media, rather than French users with diverse political views. Concluding, anomalous account usage patterns suggest the possible existence of a black-market for reusable political disinformation bots. △ Less

Submitted 30 June, 2017; originally announced July 2017.

Comments: 33 pages, 6 figures, 9 tables; submitted to First Monday

Journal ref: First Monday, 22(8), 2017

arXiv:1705.02801 [pdf, other]

doi 10.1016/j.knosys.2018.03.022

Graph Embedding Techniques, Applications, and Performance: A Survey

Authors: Palash Goyal, Emilio Ferrara

Abstract: Graphs, such as social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the structure of society, language, and different patterns of communication. Many approaches have been proposed to perform the analysis. Recently, methods which use the representation of graph nodes in vector space have gai… ▽ More Graphs, such as social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the structure of society, language, and different patterns of communication. Many approaches have been proposed to perform the analysis. Recently, methods which use the representation of graph nodes in vector space have gained traction from the research community. In this survey, we provide a comprehensive and structured analysis of various graph embedding techniques proposed in the literature. We first introduce the embedding task and its challenges such as scalability, choice of dimensionality, and features to be preserved, and their possible solutions. We then present three categories of approaches based on factorization methods, random walks, and deep learning, with examples of representative algorithms in each category and analysis of their performance on various tasks. We evaluate these state-of-the-art methods on a few common datasets and compare their performance against one another. Our analysis concludes by suggesting some potential applications and future directions. We finally present the open-source Python library we developed, named GEM (Graph Embedding Methods, available at https://github.com/palash1992/GEM), which provides all presented algorithms within a unified interface to foster and facilitate research on the topic. △ Less

Submitted 22 December, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

Comments: Submitted to Knowledge Based Systems for review

Journal ref: Knowledge Based Systems, Volume 151, 1 July 2018, Pages 78-94, 2018

arXiv:1703.06027 [pdf, other]

doi 10.1371/journal.pone.0184148

Evidence of Complex Contagion of Information in Social Media: An Experiment Using Twitter Bots

Authors: Bjarke Mønsted, Piotr Sapieżyński, Emilio Ferrara, Sune Lehmann

Abstract: It has recently become possible to study the dynamics of information diffusion in techno-social systems at scale, due to the emergence of online platforms, such as Twitter, with millions of users. One question that systematically recurs is whether information spreads according to simple or complex dynamics: does each exposure to a piece of information have an independent probability of a user adop… ▽ More It has recently become possible to study the dynamics of information diffusion in techno-social systems at scale, due to the emergence of online platforms, such as Twitter, with millions of users. One question that systematically recurs is whether information spreads according to simple or complex dynamics: does each exposure to a piece of information have an independent probability of a user adopting it (simple contagion), or does this probability depend instead on the number of sources of exposure, increasing above some threshold (complex contagion)? Most studies to date are observational and, therefore, unable to disentangle the effects of confounding factors such as social reinforcement, homophily, limited attention, or network community structure. Here we describe a novel controlled experiment that we performed on Twitter using `social bots' deployed to carry out coordinated attempts at spreading information. We propose two Bayesian statistical models describing simple and complex contagion dynamics, and test the competing hypotheses. We provide experimental evidence that the complex contagion model describes the observed information diffusion behavior more accurately than simple contagion. Future applications of our results include more effective defenses against malicious propaganda campaigns on social media, improved marketing and advertisement strategies, and design of effective network intervention techniques. △ Less

Submitted 17 March, 2017; originally announced March 2017.

Comments: 10 pages + 4 pages of supplementary information. 4+1 figures

arXiv:1702.05695 [pdf, other]

Non-negative Tensor Factorization for Human Behavioral Pattern Mining in Online Games

Authors: Anna Sapienza, Alessandro Bessi, Emilio Ferrara

Abstract: Multiplayer online battle arena has become a popular game genre. It also received increasing attention from our research community because they provide a wealth of information about human interactions and behaviors. A major problem is extracting meaningful patterns of activity from this type of data, in a way that is also easy to interpret. Here, we propose to exploit tensor decomposition techniqu… ▽ More Multiplayer online battle arena has become a popular game genre. It also received increasing attention from our research community because they provide a wealth of information about human interactions and behaviors. A major problem is extracting meaningful patterns of activity from this type of data, in a way that is also easy to interpret. Here, we propose to exploit tensor decomposition techniques, and in particular Non-negative Tensor Factorization, to discover hidden correlated behavioral patterns of play in a popular game: League of Legends. We first collect the entire gaming history of a group of about one thousand players, totaling roughly $100K$ matches. By applying our methodological framework, we then separate players into groups that exhibit similar features and playing strategies, as well as similar temporal trajectories, i.e., behavioral progressions over the course of their gaming history: this will allow us to investigate how players learn and improve their skills. △ Less

Submitted 18 February, 2017; originally announced February 2017.

Comments: 9 pages, 6 figures, submitted to KDD'17

arXiv:1702.02263 [pdf]

doi 10.1007/s42001-018-0015-z

The Rise of Jihadist Propaganda on Social Networks

Authors: Adam Badawy, Emilio Ferrara

Abstract: Using a dataset of over 1.9 million messages posted on Twitter by about 25,000 ISIS members, we explore how ISIS makes use of social media to spread its propaganda and to recruit militants from the Arab world and across the globe. By distinguishing between violence-driven, theological, and sectarian content, we trace the connection between online rhetoric and key events on the ground. To the best… ▽ More Using a dataset of over 1.9 million messages posted on Twitter by about 25,000 ISIS members, we explore how ISIS makes use of social media to spread its propaganda and to recruit militants from the Arab world and across the globe. By distinguishing between violence-driven, theological, and sectarian content, we trace the connection between online rhetoric and key events on the ground. To the best of our knowledge, ours is one of the first studies to focus on Arabic content, while most literature focuses on English content. Our findings yield new important insights about how social media is used by radical militant groups to target the Arab-speaking world, and reveal important patterns in their propaganda efforts. △ Less

Submitted 7 February, 2017; originally announced February 2017.

Comments: 22 pages, 9 figures, 7 tables

Journal ref: Journal of Computational Social Science, 2018

arXiv:1701.08170 [pdf, other]

Contagion dynamics of extremist propaganda in social networks

Authors: Emilio Ferrara

Abstract: Recent terrorist attacks carried out on behalf of ISIS on American and European soil by lone wolf attackers or sleeper cells remind us of the importance of understanding the dynamics of radicalization mediated by social media communication channels. In this paper, we shed light on the social media activity of a group of twenty-five thousand users whose association with ISIS online radical propagan… ▽ More Recent terrorist attacks carried out on behalf of ISIS on American and European soil by lone wolf attackers or sleeper cells remind us of the importance of understanding the dynamics of radicalization mediated by social media communication channels. In this paper, we shed light on the social media activity of a group of twenty-five thousand users whose association with ISIS online radical propaganda has been manually verified. By using a computational tool known as dynamical activity-connectivity maps, based on network and temporal activity patterns, we investigate the dynamics of social influence within ISIS supporters. We finally quantify the effectiveness of ISIS propaganda by determining the adoption of extremist content in the general population and draw a parallel between radical propaganda and epidemics spreading, highlighting that information broadcasters and influential ISIS supporters generate highly-infectious cascades of information contagion. Our findings will help generate effective countermeasures to combat the group and other forms of online extremism. △ Less

Submitted 7 June, 2017; v1 submitted 27 January, 2017; originally announced January 2017.

Comments: 19 pages, 8 figures; to appear in Information Sciences

arXiv:1607.06819 [pdf, other]

doi 10.1007/978-3-319-47880-7_20

Social Politics: Agenda Setting and Political Communication on Social Media

Authors: Xinxin Yang, Bo-Chiuan Chen, Mrinmoy Maity, Emilio Ferrara

Abstract: Social media play an increasingly important role in political communication. Various studies investigated how individuals adopt social media for political discussion, to share their views about politics and policy, or to mobilize and protest against social issues. Yet, little attention has been devoted to the main actors of political discussions: the politicians. In this paper, we explore the topi… ▽ More Social media play an increasingly important role in political communication. Various studies investigated how individuals adopt social media for political discussion, to share their views about politics and policy, or to mobilize and protest against social issues. Yet, little attention has been devoted to the main actors of political discussions: the politicians. In this paper, we explore the topics of discussion of U.S. President Obama and the 50 U.S. State Governors using Twitter data and agenda-setting theory as a tool to describe the patterns of daily political discussion, uncovering the main topics of attention and interest of these actors. We examine over one hundred thousand tweets produced by these politicians and identify seven macro-topics of conversation, finding that Twitter represents a particularly appealing vehicle of conversation for American opposition politicians. We highlight the main motifs of political conversation of the two parties, discovering that Republican and Democrat Governors are more or less similarly active on Twitter but exhibit different styles of communication. Finally, by reconstructing the networks of occurrences of Governors' hashtags and keywords related to political issues, we observe that Republicans form a tight core, with a stronger shared agenda on many issues of discussion. △ Less

Submitted 22 July, 2016; originally announced July 2016.

Journal ref: International Conference on Social Informatics (pp. 330-344). Springer. 2016

arXiv:1605.00659 [pdf, other]

doi 10.1007/978-3-319-47874-6_3

Predicting online extremism, content adopters, and interaction reciprocity

Authors: Emilio Ferrara, Wen-Qiang Wang, Onur Varol, Alessandro Flammini, Aram Galstyan

Abstract: We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvem… ▽ More We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvement with extremist campaigns. We also leverage millions of tweets generated by a random sample of 25 thousand regular users who were exposed to, or consumed, extremist content. We carry out three forecasting tasks, (i) to detect extremist users, (ii) to estimate whether regular users will adopt extremist content, and finally (iii) to predict whether users will reciprocate contacts initiated by extremists. All forecasting tasks are set up in two scenarios: a post hoc (time independent) prediction task on aggregated data, and a simulated real-time prediction task. The performance of our framework is extremely promising, yielding in the different forecasting scenarios up to 93% AUC for extremist user detection, up to 80% AUC for content adoption prediction, and finally up to 72% AUC for interaction reciprocity forecasting. We conclude by providing a thorough feature analysis that helps determine which are the emerging signals that provide predictive power in different scenarios. △ Less

Submitted 2 May, 2016; originally announced May 2016.

Comments: 9 pages, 3 figures, 8 tables

Journal ref: International Conference on Social Informatics (pp. 22-39). Springer. 2016

arXiv:1601.05140 [pdf]

doi 10.1109/MC.2016.183

The DARPA Twitter Bot Challenge

Authors: V. S. Subrahmanian, Amos Azaria, Skylar Durst, Vadim Kagan, Aram Galstyan, Kristina Lerman, Linhong Zhu, Emilio Ferrara, Alessandro Flammini, Filippo Menczer, Andrew Stevens, Alexander Dekhtyar, Shuyang Gao, Tad Hogg, Farshad Kooti, Yan Liu, Onur Varol, Prashant Shiralkar, Vinod Vydiswaran, Qiaozhu Mei, Tim Hwang

Abstract: A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before t… ▽ More A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams. △ Less

Submitted 21 April, 2016; v1 submitted 19 January, 2016; originally announced January 2016.

Comments: IEEE Computer Magazine, in press

Journal ref: Computer 49 (6), 38-46. IEEE, 2016

arXiv:1510.05318 [pdf, other]

doi 10.1145/2872427.2883031

Latent Space Model for Multi-Modal Social Data

Authors: Yoon-Sik Cho, Greg Ver Steeg, Emilio Ferrara, Aram Galstyan

Abstract: With the emergence of social networking services, researchers enjoy the increasing availability of large-scale heterogenous datasets capturing online user interactions and behaviors. Traditional analysis of techno-social systems data has focused mainly on describing either the dynamics of social interactions, or the attributes and behaviors of the users. However, overwhelming empirical evidence su… ▽ More With the emergence of social networking services, researchers enjoy the increasing availability of large-scale heterogenous datasets capturing online user interactions and behaviors. Traditional analysis of techno-social systems data has focused mainly on describing either the dynamics of social interactions, or the attributes and behaviors of the users. However, overwhelming empirical evidence suggests that the two dimensions affect one another, and therefore they should be jointly modeled and analyzed in a multi-modal framework. The benefits of such an approach include the ability to build better predictive models, leveraging social network information as well as user behavioral signals. To this purpose, here we propose the Constrained Latent Space Model (CLSM), a generalized framework that combines Mixed Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA) incorporating a constraint that forces the latent space to concurrently describe the multiple data modalities. We derive an efficient inference algorithm based on Variational Expectation Maximization that has a computational cost linear in the size of the network, thus making it feasible to analyze massive social datasets. We validate the proposed framework on two problems: prediction of social interactions from user attributes and behaviors, and behavior prediction exploiting network information. We perform experiments with a variety of multi-modal social systems, spanning location-based social networks (Gowalla), social media services (Instagram, Orkut), e-commerce and review sites (Amazon, Ciao), and finally citation networks (Cora). The results indicate significant improvement in prediction accuracy over state of the art methods, and demonstrate the flexibility of the proposed approach for addressing a variety of different learning problems commonly occurring with multi-modal social data. △ Less

Submitted 18 October, 2015; originally announced October 2015.

Comments: 12 pages, 7 figures, 2 tables

Journal ref: Proceedings of the 25th International Conference on World Wide Web (pp. 447-458). 2016

arXiv:1509.01608 [pdf, other]

doi 10.1016/j.ins.2016.02.027

Network Structure and Resilience of Mafia Syndicates

Authors: Santa Agreste, Salvatore Catanese, Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara

Abstract: In this paper we present the results of the study of Sicilian Mafia organization by using Social Network Analysis. The study investigates the network structure of a Mafia organization, describing its evolution and highlighting its plasticity to interventions targeting membership and its resilience to disruption caused by police operations. We analyze two different datasets about Mafia gangs built… ▽ More In this paper we present the results of the study of Sicilian Mafia organization by using Social Network Analysis. The study investigates the network structure of a Mafia organization, describing its evolution and highlighting its plasticity to interventions targeting membership and its resilience to disruption caused by police operations. We analyze two different datasets about Mafia gangs built by examining different digital trails and judicial documents spanning a period of ten years: the former dataset includes the phone contacts among suspected individuals, the latter is constituted by the relationships among individuals actively involved in various criminal offenses. Our report illustrates the limits of traditional investigation methods like tapping: criminals high up in the organization hierarchy do not occupy the most central positions in the criminal network, and oftentimes do not appear in the reconstructed criminal network at all. However, we also suggest possible strategies of intervention, as we show that although criminal networks (i.e., the network encoding mobsters and crime relationships) are extremely resilient to different kind of attacks, contact networks (i.e., the network reporting suspects and reciprocated phone calls) are much more vulnerable and their analysis can yield extremely valuable insights. △ Less

Submitted 4 September, 2015; originally announced September 2015.

Comments: 22 pages, 10 figures, 1 table

Journal ref: Information Sciences, 351, 30-47. 2016

arXiv:1508.04185 [pdf, other]

doi 10.1145/2818048.2820065

Style in the Age of Instagram: Predicting Success within the Fashion Industry using Social Media

Authors: Jaehyuk Park, Giovanni Luca Ciampaglia, Emilio Ferrara

Abstract: Fashion is a multi-billion dollar industry with social and economic implications worldwide. To gain popularity, brands want to be represented by the top popular models. As new faces are selected using stringent (and often criticized) aesthetic criteria, \emph{a priori} predictions are made difficult by information cascades and other fundamental trend-setting mechanisms. However, the increasing usa… ▽ More Fashion is a multi-billion dollar industry with social and economic implications worldwide. To gain popularity, brands want to be represented by the top popular models. As new faces are selected using stringent (and often criticized) aesthetic criteria, \emph{a priori} predictions are made difficult by information cascades and other fundamental trend-setting mechanisms. However, the increasing usage of social media within and without the industry may be affecting this traditional system. We therefore seek to understand the ingredients of success of fashion models in the age of Instagram. Combining data from a comprehensive online fashion database and the popular mobile image-sharing platform, we apply a machine learning framework to predict the tenure of a cohort of new faces for the 2015 Spring\,/\,Summer season throughout the subsequent 2015-16 Fall\,/\,Winter season. Our framework successfully predicts most of the new popular models who appeared in 2015. In particular, we find that a strong social media presence may be more important than being under contract with a top agency, or than the aesthetic standards sought after by the industry. △ Less

Submitted 17 August, 2015; originally announced August 2015.

Comments: 10 pages, 5 figures, accepted for presentation at CSCW'16

Journal ref: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (pp. 64-73). ACM. 2016

arXiv:1506.06072 [pdf, other]

doi 10.7717/peerj-cs.26

Quantifying the Effect of Sentiment on Information Diffusion in Social Media

Authors: Emilio Ferrara, Zeyao Yang

Abstract: Social media have become the main vehicle of information production and consumption online. Millions of users every day log on their Facebook or Twitter accounts to get updates and news, read about their topics of interest, and become exposed to new opportunities and interactions. Although recent studies suggest that the contents users produce will affect the emotions of their readers, we still la… ▽ More Social media have become the main vehicle of information production and consumption online. Millions of users every day log on their Facebook or Twitter accounts to get updates and news, read about their topics of interest, and become exposed to new opportunities and interactions. Although recent studies suggest that the contents users produce will affect the emotions of their readers, we still lack a rigorous understanding of the role and effects of contents sentiment on the dynamics of information diffusion. This work aims at quantifying the effect of sentiment on information diffusion, to understand: (i) whether positive conversations spread faster and/or broader than negative ones (or vice-versa); (ii) what kind of emotions are more typical of popular conversations on social media; and, (iii) what type of sentiment is expressed in conversations characterized by different temporal dynamics. Our findings show that, at the level of contents, negative messages spread faster than positive ones, but positive ones reach larger audiences, suggesting that people are more inclined to share and favorite positive contents, the so-called positive bias. As for the entire conversations, we highlight how different temporal dynamics exhibit different sentiment patterns: for example, positive sentiment builds up for highly-anticipated events, while unexpected events are mainly characterized by negative sentiment. Our contribution is a milestone to understand how the emotions expressed in short texts affect their spreading in online social ecosystems, and may help to craft effective policies and strategies for content generation and diffusion. △ Less

Submitted 19 June, 2015; originally announced June 2015.

Comments: 10 pages, 5 figures

Journal ref: PeerJ Computer Science, 1, e26. 2015

arXiv:1506.06021 [pdf, other]

doi 10.1371/journal.pone.0142390

Measuring Emotional Contagion in Social Media

Authors: Emilio Ferrara, Zeyao Yang

Abstract: Social media are used as main discussion channels by millions of individuals every day. The content individuals produce in daily social-media-based micro-communications, and the emotions therein expressed, may impact the emotional states of others. A recent experiment performed on Facebook hypothesized that emotions spread online, even in absence of non-verbal cues typical of in-person interaction… ▽ More Social media are used as main discussion channels by millions of individuals every day. The content individuals produce in daily social-media-based micro-communications, and the emotions therein expressed, may impact the emotional states of others. A recent experiment performed on Facebook hypothesized that emotions spread online, even in absence of non-verbal cues typical of in-person interactions, and that individuals are more likely to adopt positive or negative emotions if these are over-expressed in their social network. Experiments of this type, however, raise ethical concerns, as they require massive-scale content manipulation with unknown consequences for the individuals therein involved. Here, we study the dynamics of emotional contagion using Twitter. Rather than manipulating content, we devise a null model that discounts some confounding factors (including the effect of emotional contagion). We measure the emotional valence of content the users are exposed to before posting their own tweets. We determine that on average a negative post follows an over-exposure to 4.34% more negative content than baseline, while positive posts occur after an average over-exposure to 4.50% more positive contents. We highlight the presence of a linear relationship between the average emotional valence of the stimuli users are exposed to, and that of the responses they produce. We also identify two different classes of individuals: highly and scarcely susceptible to emotional contagion. Highly susceptible users are significantly less inclined to adopt negative emotions than the scarcely susceptible ones, but equally likely to adopt positive emotions. In general, the likelihood of adopting positive emotions is much greater than that of negative emotions. △ Less

Submitted 19 June, 2015; originally announced June 2015.

Comments: 10 pages, 5 figures

Journal ref: PloS one, 10(11), e0142390. 2015

arXiv:1505.06454 [pdf, other]

doi 10.1073/pnas.1424329112

Defining and identifying Sleeping Beauties in science

Authors: Qing Ke, Emilio Ferrara, Filippo Radicchi, Alessandro Flammini

Abstract: A Sleeping Beauty (SB) in science refers to a paper whose importance is not recognized for several years after publication. Its citation history exhibits a long hibernation period followed by a sudden spike of popularity. Previous studies suggest a relative scarcity of SBs. The reliability of this conclusion is, however, heavily dependent on identification methods based on arbitrary threshold para… ▽ More A Sleeping Beauty (SB) in science refers to a paper whose importance is not recognized for several years after publication. Its citation history exhibits a long hibernation period followed by a sudden spike of popularity. Previous studies suggest a relative scarcity of SBs. The reliability of this conclusion is, however, heavily dependent on identification methods based on arbitrary threshold parameters for sleeping time and number of citations, applied to small or monodisciplinary bibliographic datasets. Here we present a systematic, large-scale, and multidisciplinary analysis of the SB phenomenon in science. We introduce a parameter-free measure that quantifies the extent to which a specific paper can be considered an SB. We apply our method to 22 million scientific papers published in all disciplines of natural and social sciences over a time span longer than a century. Our results reveal that the SB phenomenon is not exceptional. There is a continuous spectrum of delayed recognition where both the hibernation period and the awakening intensity are taken into account. Although many cases of SBs can be identified by looking at monodisciplinary bibliographic data, the SB phenomenon becomes much more apparent with the analysis of multidisciplinary datasets, where we can observe many examples of papers achieving delayed yet exceptional importance in disciplines different from those where they were originally published. Our analysis emphasizes a complex feature of citation dynamics that so far has received little attention, and also provides empirical evidence against the use of short-term citation metrics in the quantification of scientific impact. △ Less

Submitted 24 May, 2015; originally announced May 2015.

Comments: 40 pages, Supporting Information included, top examples listed at http://qke.github.io/projects/beauty/beauty.html

Journal ref: Proc. Natl. Acad. Sci. USA 112, 7426-7431 (2015)

arXiv:1503.03752 [pdf, other]

doi 10.1145/2749279.2749283

Manipulation and abuse on social media

Authors: Emilio Ferrara

Abstract: The computer science research community has became increasingly interested in the study of social media due to their pervasiveness in the everyday life of millions of individuals. Methodological questions and technical challenges abound as more and more data from social platforms become available for analysis. This data deluge not only yields the unprecedented opportunity to unravel questions abou… ▽ More The computer science research community has became increasingly interested in the study of social media due to their pervasiveness in the everyday life of millions of individuals. Methodological questions and technical challenges abound as more and more data from social platforms become available for analysis. This data deluge not only yields the unprecedented opportunity to unravel questions about online individuals' behavior at scale, but also allows to explore the potential perils that the massive adoption of social media brings to our society. These communication channels provide plenty of incentives (both economical and social) and opportunities for abuse. As social media activity became increasingly intertwined with the events in the offline world, individuals and organizations have found ways to exploit these platforms to spread misinformation, to attack and smear others, or to deceive and manipulate. During crises, social media have been effectively used for emergency response, but fear-mongering actions have also triggered mass hysteria and panic. Criminal gangs and terrorist organizations like ISIS adopt social media for propaganda and recruitment. Synthetic activity and social bots have been used to coordinate orchestrated astroturf campaigns, to manipulate political elections and the stock market. The lack of effective content verification systems on many of these platforms, including Twitter and Facebook, rises concerns when younger users become exposed to cyber-bulling, harassment, or hate speech, inducing risks like depression and suicide. This article illustrates some of the recent advances facing these issues and discusses what it remains to be done, including the challenges to address in the future to make social media a more useful and accessible, safer and healthier environment for all users. △ Less

Submitted 12 March, 2015; v1 submitted 12 March, 2015; originally announced March 2015.

Comments: ACM SIGWEB Newsletter, Spring 2015

arXiv:1502.05886 [pdf, other]

doi 10.1145/2817946.2817949

On predictability of rare events leveraging social media: a machine learning perspective

Authors: Lei Le, Emilio Ferrara, Alessandro Flammini

Abstract: Information extracted from social media streams has been leveraged to forecast the outcome of a large number of real-world events, from political elections to stock market fluctuations. An increasing amount of studies demonstrates how the analysis of social media conversations provides cheap access to the wisdom of the crowd. However, extents and contexts in which such forecasting power can be eff… ▽ More Information extracted from social media streams has been leveraged to forecast the outcome of a large number of real-world events, from political elections to stock market fluctuations. An increasing amount of studies demonstrates how the analysis of social media conversations provides cheap access to the wisdom of the crowd. However, extents and contexts in which such forecasting power can be effectively leveraged are still unverified at least in a systematic way. It is also unclear how social-media-based predictions compare to those based on alternative information sources. To address these issues, here we develop a machine learning framework that leverages social media streams to automatically identify and predict the outcomes of soccer matches. We focus in particular on matches in which at least one of the possible outcomes is deemed as highly unlikely by professional bookmakers. We argue that sport events offer a systematic approach for testing the predictive power of social media, and allow to compare such power against the rigorous baselines set by external sources. Despite such strict baselines, our framework yields above 8% marginal profit when used to inform simple betting strategies. The system is based on real-time sentiment analysis and exploits data collected immediately before the games, allowing for informed bets. We discuss the rationale behind our approach, describe the learning framework, its prediction performance and the return it provides as compared to a set of betting strategies. To test our framework we use both historical Twitter data from the 2014 FIFA World Cup games, and real-time Twitter data collected by monitoring the conversations about all soccer matches of four major European tournaments (FA Premier League, Serie A, La Liga, and Bundesliga), and the 2014 UEFA Champions League, during the period between Oct. 25th 2014 and Nov. 26th 2014. △ Less

Submitted 20 February, 2015; originally announced February 2015.

Comments: 10 pages, 10 tables, 8 figures

Journal ref: Proceedings of the 2015 ACM on Conference on Online Social Networks (pp. 3-13). ACM. 2015

arXiv:1411.7357 [pdf, other]

doi 10.1016/j.joi.2015.07.008

Quality versus quantity in scientific impact

Authors: Jasleen Kaur, Emilio Ferrara, Filippo Menczer, Alessandro Flammini, Filippo Radicchi

Abstract: Citation metrics are becoming pervasive in the quantitative evaluation of scholars, journals and institutions. More then ever before, hiring, promotion, and funding decisions rely on a variety of impact metrics that cannot disentangle quality from quantity of scientific output, and are biased by factors such as discipline and academic age. Biases affecting the evaluation of single papers are compo… ▽ More Citation metrics are becoming pervasive in the quantitative evaluation of scholars, journals and institutions. More then ever before, hiring, promotion, and funding decisions rely on a variety of impact metrics that cannot disentangle quality from quantity of scientific output, and are biased by factors such as discipline and academic age. Biases affecting the evaluation of single papers are compounded when one aggregates citation-based metrics across an entire publication record. It is not trivial to compare the quality of two scholars that during their careers have published at different rates in different disciplines in different periods of time. We propose a novel solution based on the generation of a statistical baseline specifically tailored on the academic profile of each researcher. Our method can decouple the roles of quantity and quality of publications to explain how a certain level of impact is achieved. The method is flexible enough to allow for the evaluation of, and fair comparison among, arbitrary collections of papers --- scholar publication records, journals, and entire institutions; and can be extended to simultaneously suppresses any source of bias. We show that our method can capture the quality of the work of Nobel laureates irrespective of number of publications, academic age, and discipline, even when traditional metrics indicate low impact in absolute terms. We further apply our methodology to almost a million scholars and over six thousand journals to measure the impact that cannot be explained by the volume of publications alone. △ Less

Submitted 15 December, 2014; v1 submitted 26 November, 2014; originally announced November 2014.

Comments: 20 pages, 7 figures, and 1 table

Journal ref: Journal of Informetrics 9 (2015), pp. 800-808

arXiv:1411.0652 [pdf, other]

doi 10.1007/s13278-014-0237-x

Clustering memes in social media streams

Authors: Mohsen JafariAsbagh, Emilio Ferrara, Onur Varol, Filippo Menczer, Alessandro Flammini

Abstract: The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information car… ▽ More The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to "forget" old memes and replace them over time with the new ones. The evaluation of our framework is carried out by using a dataset of Twitter trending topics. Over a one-week period, we systematically determined whether our algorithm was able to recover the trending hashtags. We show that the proposed method outperforms baseline algorithms that only use content features, as well as a state-of-the-art event detection method that assumes full knowledge of the underlying follower network. We finally show that our online learning framework is flexible, due to its independence of the adopted clustering algorithm, and best suited to work in a streaming scenario. △ Less

Submitted 3 November, 2014; originally announced November 2014.

Comments: 25 pages, 8 figures, accepted on Social Network Analysis and Mining (SNAM). The final publication is available at Springer via http://dx.doi.org/10.1007/s13278-014-0237-x

Journal ref: Social Network Analysis and Mining, 4(1), 1-13. 2014

arXiv:1407.5225 [pdf, other]

doi 10.1145/2818717

The Rise of Social Bots

Authors: Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, Alessandro Flammini

Abstract: The Turing test aimed to recognize the behavior of a human from that of a computer algorithm. Such challenge is more relevant than ever in today's social media context, where limited attention and technology constrain the expressive power of humans, while incentives abound to develop software agents mimicking humans. These social bots interact, often unnoticed, with real people in social media eco… ▽ More The Turing test aimed to recognize the behavior of a human from that of a computer algorithm. Such challenge is more relevant than ever in today's social media context, where limited attention and technology constrain the expressive power of humans, while incentives abound to develop software agents mimicking humans. These social bots interact, often unnoticed, with real people in social media ecosystems, but their abundance is uncertain. While many bots are benign, one can design harmful bots with the goals of persuading, smearing, or deceiving. Here we discuss the characteristics of modern, sophisticated social bots, and how their presence can endanger online ecosystems and our society. We then review current efforts to detect social bots on Twitter. Features related to content, network, sentiment, and temporal patterns of activity are imitated by bots but at the same time can help discriminate synthetic behaviors from human ones, yielding signatures of engineered social tampering. △ Less

Submitted 6 March, 2017; v1 submitted 19 July, 2014; originally announced July 2014.

Comments: Check http://cacm.acm.org/magazines/2016/7/204021-the-rise-of-social-bots/fulltext for the final version; 'Bot or Not?' is available at: http://truthy.indiana.edu/botornot/

Journal ref: Communications of the ACM 59 (7), 96-104, 2016

arXiv:1407.2837 [pdf, other]

Visualizing criminal networks reconstructed from mobile phone records

Authors: Emilio Ferrara, Pasquale De Meo, Salvatore Catanese, Giacomo Fiumara

Abstract: In the fight against the racketeering and terrorism, knowledge about the structure and the organization of criminal networks is of fundamental importance for both the investigations and the development of efficient strategies to prevent and restrain crimes. Intelligence agencies exploit information obtained from the analysis of large amounts of heterogeneous data deriving from various informative… ▽ More In the fight against the racketeering and terrorism, knowledge about the structure and the organization of criminal networks is of fundamental importance for both the investigations and the development of efficient strategies to prevent and restrain crimes. Intelligence agencies exploit information obtained from the analysis of large amounts of heterogeneous data deriving from various informative sources including the records of phone traffic, the social networks, surveillance data, interview data, experiential police data, and police intelligence files, to acquire knowledge about criminal networks and initiate accurate and destabilizing actions. In this context, visual representation techniques coordinate the exploration of the structure of the network together with the metrics of social network analysis. Nevertheless, the utility of visualization tools may become limited when the dimension and the complexity of the system under analysis grow beyond certain terms. In this paper we show how we employ some interactive visualization techniques to represent criminal and terrorist networks reconstructed from phone traffic data, namely foci, fisheye and geo-mapping network layouts. These methods allow the exploration of the network through animated transitions among visualization models and local enlargement techniques in order to improve the comprehension of interesting areas. By combining the features of the various visualization models it is possible to gain substantial enhancements with respect to classic visualization models, often unreadable in those cases of great complexity of the network. △ Less

Submitted 10 July, 2014; originally announced July 2014.

Comments: 6 pages, 4 figures, DataWiz 2014 (held in conjunction with ACM Hypertext 2014)

arXiv:1406.7751 [pdf, other]

doi 10.1145/2631775.2631808

Online Popularity and Topical Interests through the Lens of Instagram

Authors: Emilio Ferrara, Roberto Interdonato, Andrea Tagarelli

Abstract: Online socio-technical systems can be studied as proxy of the real world to investigate human behavior and social interactions at scale. Here we focus on Instagram, a media-sharing online platform whose popularity has been rising up to gathering hundred millions users. Instagram exhibits a mixture of features including social structure, social tagging and media sharing. The network of social inter… ▽ More Online socio-technical systems can be studied as proxy of the real world to investigate human behavior and social interactions at scale. Here we focus on Instagram, a media-sharing online platform whose popularity has been rising up to gathering hundred millions users. Instagram exhibits a mixture of features including social structure, social tagging and media sharing. The network of social interactions among users models various dynamics including follower/followee relations and users' communication by means of posts/comments. Users can upload and tag media such as photos and pictures, and they can "like" and comment each piece of information on the platform. In this work we investigate three major aspects on our Instagram dataset: (i) the structural characteristics of its network of heterogeneous interactions, to unveil the emergence of self organization and topically-induced community structure; (ii) the dynamics of content production and consumption, to understand how global trends and popular users emerge; (iii) the behavior of users labeling media with tags, to determine how they devote their attention and to explore the variety of their topical interests. Our analysis provides clues to understand human behavior dynamics on socio-technical systems, specifically users and content popularity, the mechanisms of users' interactions in online environments and how collective trends emerge from individuals' topical interests. △ Less

Submitted 30 June, 2014; originally announced June 2014.

Comments: 11 pages, 11 figures, Proceedings of ACM Hypertext 2014

Journal ref: Proceedings of the 25th ACM conference on Hypertext and social media (pp. 24-34). ACM. 2014

arXiv:1406.7197 [pdf, other]

doi 10.1145/2615569.2615699

Evolution of Online User Behavior During a Social Upheaval

Authors: Onur Varol, Emilio Ferrara, Christine L. Ogan, Filippo Menczer, Alessandro Flammini

Abstract: Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first c… ▽ More Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first characterize the spatio-temporal nature of the conversation about the Gezi Park demonstrations, showing that similarity in trends of discussion mirrors geographic cues. We then describe the characteristics of the users involved in this conversation and what roles they played. We study how roles and individual influence evolved during the period of the upheaval. This analysis reveals that the conversation becomes more democratic as events unfold, with a redistribution of influence over time in the user population. We conclude by observing how the online and offline worlds are tightly intertwined, showing that exogenous events, such as political speeches or police actions, affect social media conversations and trigger changes in individual behavior. △ Less

Submitted 27 June, 2014; originally announced June 2014.

Comments: Best Paper Award at ACM Web Science 2014

Journal ref: Proceedings of the 2014 ACM conference on Web science, Pages 81-90

arXiv:1404.1295 [pdf, other]

doi 10.1016/j.eswa.2014.03.024

Detecting criminal organizations in mobile phone networks

Authors: Emilio Ferrara, Pasquale De Meo, Salvatore Catanese, Giacomo Fiumara

Abstract: The study of criminal networks using traces from heterogeneous communication media is acquiring increasing importance in nowadays society. The usage of communication media such as phone calls and online social networks leaves digital traces in the form of metadata that can be used for this type of analysis. The goal of this work is twofold: first we provide a theoretical framework for the problem… ▽ More The study of criminal networks using traces from heterogeneous communication media is acquiring increasing importance in nowadays society. The usage of communication media such as phone calls and online social networks leaves digital traces in the form of metadata that can be used for this type of analysis. The goal of this work is twofold: first we provide a theoretical framework for the problem of detecting and characterizing criminal organizations in networks reconstructed from phone call records. Then, we introduce an expert system to support law enforcement agencies in the task of unveiling the underlying structure of criminal networks hidden in communication data. This platform allows for statistical network analysis, community detection and visual exploration of mobile phone network data. It allows forensic investigators to deeply understand hierarchies within criminal organizations, discovering members who play central role and provide connection among sub-groups. Our work concludes illustrating the adoption of our computational framework for a real-word criminal investigation. △ Less

Submitted 3 April, 2014; originally announced April 2014.

Comments: http://www.sciencedirect.com/science/article/pii/S0957417414001614. Expert Systems with Applications, 2014

Journal ref: Expert Systems with Applications, 41(13), 5733-5750. 2014

arXiv:1402.1778 [pdf, other]

doi 10.1109/TSMC.2014.2378215

Analysis of a heterogeneous social network of humans and cultural objects

Authors: Santa Agreste, Pasquale De Meo, Emilio Ferrara, Sebastiano Piccolo, Alessandro Provetti

Abstract: Modern online social platforms enable their members to be involved in a broad range of activities like getting friends, joining groups, posting/commenting resources and so on. In this paper we investigate whether a correlation emerges across the different activities a user can take part in. To perform our analysis we focused on aNobii, a social platform with a world-wide user base of book readers,… ▽ More Modern online social platforms enable their members to be involved in a broad range of activities like getting friends, joining groups, posting/commenting resources and so on. In this paper we investigate whether a correlation emerges across the different activities a user can take part in. To perform our analysis we focused on aNobii, a social platform with a world-wide user base of book readers, who like to post their readings, give ratings, review books and discuss them with friends and fellow readers. aNobii presents a heterogeneous structure: i) part social network, with user-to-user interactions, ii) part interest network, with the management of book collections, and iii) part folksonomy, with books that are tagged by the users. We analyzed a complete and anonymized snapshot of aNobii and we focused on three specific activities a user can perform, namely her tagging behavior, her tendency to join groups and her aptitude to compile a wishlist reporting the books she is planning to read. In this way each user is associated with a tag-based, a group-based and a wishlist-based profile. Experimental analysis carried out by means of Information Theory tools like entropy and mutual information suggests that tag-based and group-based profiles are in general more informative than wishlist-based ones. Furthermore, we discover that the degree of correlation between the three profiles associated with the same user tend to be small. Hence, user profiling cannot be reduced to considering just any one type of user activity (although important) but it is crucial to incorporate multiple dimensions to effectively describe users preferences and behavior. △ Less

Submitted 7 February, 2014; originally announced February 2014.

Comments: 12 pages, 9 figures - Transactions on Systems, Man and Cybernetics: Systems - under review

Journal ref: IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol.45, no.4, pp.559,570, April 2015

arXiv:1401.1257 [pdf, other]

doi 10.1103/PhysRevLett.113.088701

Optimal network modularity for information diffusion

Authors: Azadeh Nematzadeh, Emilio Ferrara, Alessandro Flammini, Yong-Yeol Ahn

Abstract: We investigate the impact of community structure on information diffusion with the linear threshold model. Our results demonstrate that modular structure may have counter-intuitive effects on information diffusion when social reinforcement is present. We show that strong communities can facilitate global diffusion by enhancing local, intra-community spreading. Using both analytic approaches and nu… ▽ More We investigate the impact of community structure on information diffusion with the linear threshold model. Our results demonstrate that modular structure may have counter-intuitive effects on information diffusion when social reinforcement is present. We show that strong communities can facilitate global diffusion by enhancing local, intra-community spreading. Using both analytic approaches and numerical simulations, we demonstrate the existence of an optimal network modularity, where global diffusion require the minimal number of early adopters. △ Less

Submitted 18 September, 2014; v1 submitted 6 January, 2014; originally announced January 2014.

Comments: 8 pages, 10 figures

Journal ref: Phys. Rev. Lett. 113, 088701 (2014)

arXiv:1310.4399 [pdf, other]

doi 10.1145/2535526

Analyzing User Behavior across Social Sharing Environments

Authors: Pasquale De Meo, Emilio Ferrara, Fabian Abel, Lora Aroyo, Geert-Jan Houben

Abstract: In this work we present an in-depth analysis of the user behaviors on different Social Sharing systems. We consider three popular platforms, Flickr, Delicious and StumbleUpon, and, by combining techniques from social network analysis with techniques from semantic analysis, we characterize the tagging behavior as well as the tendency to create friendship relationships of the users of these platform… ▽ More In this work we present an in-depth analysis of the user behaviors on different Social Sharing systems. We consider three popular platforms, Flickr, Delicious and StumbleUpon, and, by combining techniques from social network analysis with techniques from semantic analysis, we characterize the tagging behavior as well as the tendency to create friendship relationships of the users of these platforms. The aim of our investigation is to see if (and how) the features and goals of a given Social Sharing system reflect on the behavior of its users and, moreover, if there exists a correlation between the social and tagging behavior of the users. We report our findings in terms of the characteristics of user profiles according to three different dimensions: (i) intensity of user activities, (ii) tag-based characteristics of user profiles, and (iii) semantic characteristics of user profiles. △ Less

Submitted 16 October, 2013; originally announced October 2013.

Journal ref: ACM Transactions on Intelligent Systems and Technology, Vol. 5, No. 1, Article 1 (2013)

arXiv:1310.2671 [pdf, other]

doi 10.1145/2512938.2512956

Traveling Trends: Social Butterflies or Frequent Fliers?

Authors: Emilio Ferrara, Onur Varol, Filippo Menczer, Alessandro Flammini

Abstract: Trending topics are the online conversations that grab collective attention on social media. They are continually changing and often reflect exogenous events that happen in the real world. Trends are localized in space and time as they are driven by activity in specific geographic areas that act as sources of traffic and information flow. Taken independently, trends and geography have been discuss… ▽ More Trending topics are the online conversations that grab collective attention on social media. They are continually changing and often reflect exogenous events that happen in the real world. Trends are localized in space and time as they are driven by activity in specific geographic areas that act as sources of traffic and information flow. Taken independently, trends and geography have been discussed in recent literature on online social media; although, so far, little has been done to characterize the relation between trends and geography. Here we investigate more than eleven thousand topics that trended on Twitter in 63 main US locations during a period of 50 days in 2013. This data allows us to study the origins and pathways of trends, how they compete for popularity at the local level to emerge as winners at the country level, and what dynamics underlie their production and consumption in different geographic areas. We identify two main classes of trending topics: those that surface locally, coinciding with three different geographic clusters (East coast, Midwest and Southwest); and those that emerge globally from several metropolitan areas, coinciding with the major air traffic hubs of the country. These hubs act as trendsetters, generating topics that eventually trend at the country level, and driving the conversation across the country. This poses an intriguing conjecture, drawing a parallel between the spread of information and diseases: Do trends travel faster by airplane than over the Internet? △ Less

Submitted 9 October, 2013; originally announced October 2013.

Comments: Proceedings of the first ACM conference on Online social networks, pp. 213-222, 2013

Journal ref: Proceedings of the first ACM conference on Online social networks (pp. 213-222). ACM. 2013

arXiv:1310.2665 [pdf, other]

doi 10.1145/2492517.2492530

Clustering Memes in Social Media

Authors: Emilio Ferrara, Mohsen JafariAsbagh, Onur Varol, Vahed Qazvinian, Filippo Menczer, Alessandro Flammini

Abstract: The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme,… ▽ More The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data. △ Less

Submitted 9 October, 2013; originally announced October 2013.

Comments: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM'13), 2013

Journal ref: Advances in social networks analysis and mining (ASONAM), 2013 IEEE/ACM international conference on (pp. 548-555). IEEE

arXiv:1306.5474 [pdf]

doi 10.1371/journal.pone.0064679

The Digital Evolution of Occupy Wall Street

Authors: Michael D. Conover, Emilio Ferrara, Filippo Menczer, Alessandro Flammini

Abstract: We examine the temporal evolution of digital communication activity relating to the American anti-capitalist movement Occupy Wall Street. Using a high-volume sample from the microblogging site Twitter, we investigate changes in Occupy participant engagement, interests, and social connectivity over a fifteen month period starting three months prior to the movement's first protest action. The result… ▽ More We examine the temporal evolution of digital communication activity relating to the American anti-capitalist movement Occupy Wall Street. Using a high-volume sample from the microblogging site Twitter, we investigate changes in Occupy participant engagement, interests, and social connectivity over a fifteen month period starting three months prior to the movement's first protest action. The results of this analysis indicate that, on Twitter, the Occupy movement tended to elicit participation from a set of highly interconnected users with pre-existing interests in domestic politics and foreign social movements. These users, while highly vocal in the months immediately following the birth of the movement, appear to have lost interest in Occupy related communication over the remainder of the study period. △ Less

Submitted 23 June, 2013; originally announced June 2013.

Comments: Open access available at: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0064679

Journal ref: PLoS ONE 8(5):e64679 2013

arXiv:1306.5473 [pdf]

doi 10.1371/journal.pone.0055957

The Geospatial Characteristics of a Social Movement Communication Network

Authors: Michael D. Conover, Clayton Davis, Emilio Ferrara, Karissa McKelvey, Filippo Menczer, Alessandro Flammini

Abstract: Social movements rely in large measure on networked communication technologies to organize and disseminate information relating to the movements' objectives. In this work we seek to understand how the goals and needs of a protest movement are reflected in the geographic patterns of its communication network, and how these patterns differ from those of stable political communication. To this end, w… ▽ More Social movements rely in large measure on networked communication technologies to organize and disseminate information relating to the movements' objectives. In this work we seek to understand how the goals and needs of a protest movement are reflected in the geographic patterns of its communication network, and how these patterns differ from those of stable political communication. To this end, we examine an online communication network reconstructed from over 600,000 tweets from a thirty-six week period covering the birth and maturation of the American anticapitalist movement, Occupy Wall Street. We find that, compared to a network of stable domestic political communication, the Occupy Wall Street network exhibits higher levels of locality and a hub and spoke structure, in which the majority of non-local attention is allocated to high-profile locations such as New York, California, and Washington D.C. Moreover, we observe that information flows across state boundaries are more likely to contain framing language and references to the media, while communication among individuals in the same state is more likely to reference protest action and specific places and and times. Tying these results to social movement theory, we propose that these features reflect the movement's efforts to mobilize resources at the local level and to develop narrative frames that reinforce collective purpose at the national level. △ Less

Submitted 23 June, 2013; originally announced June 2013.

Comments: Open access available at: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0064679

Journal ref: PLoS ONE 8(3):e55957 2013

arXiv:1303.1827 [pdf, other]

doi 10.1007/s13278-012-0060-1

Forensic Analysis of Phone Call Networks

Authors: Salvatore Catanese, Emilio Ferrara, Giacomo Fiumara

Abstract: In the context of preventing and fighting crime, the analysis of mobile phone traffic, among actors of a criminal network, is helpful in order to reconstruct illegal activities on the base of the relationships connecting those specific individuals. Thus, forensic analysts and investigators require new advanced tools and techniques which allow them to manage these data in a meaningful and efficient… ▽ More In the context of preventing and fighting crime, the analysis of mobile phone traffic, among actors of a criminal network, is helpful in order to reconstruct illegal activities on the base of the relationships connecting those specific individuals. Thus, forensic analysts and investigators require new advanced tools and techniques which allow them to manage these data in a meaningful and efficient way. In this paper we present LogAnalysis, a tool we developed to provide visual data representation and filtering, statistical analysis features and the possibility of a temporal analysis of mobile phone activities. Its adoption may help in unveiling the structure of a criminal network and the roles and dynamics of communications among its components. By using LogAnalysis, forensic investigators could deeply understand hierarchies within criminal organizations, for example discovering central members that provide connections among different sub-groups, etc. Moreover, by analyzing the temporal evolution of the contacts among individuals, or by focusing on specific time windows they could acquire additional insights on the data they are analyzing. Finally, we put into evidence how the adoption of LogAnalysis may be crucial to solve real cases, providing as example a number of case studies inspired by real forensic investigations led by one of the authors. △ Less

Submitted 7 March, 2013; originally announced March 2013.

Comments: 18 pages, 10 figures

Journal ref: Social Network Analysis and Mining, 3(1):15-33, 2013

arXiv:1303.1747 [pdf, other]

doi 10.1016/j.knosys.2012.01.007

A Novel Measure of Edge Centrality in Social Networks

Authors: Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, Angela Ricciardello

Abstract: The problem of assigning centrality values to nodes and edges in graphs has been widely investigated during last years. Recently, a novel measure of node centrality has been proposed, called k-path centrality index, which is based on the propagation of messages inside a network along paths consisting of at most k edges. On the other hand, the importance of computing the centrality of edges has bee… ▽ More The problem of assigning centrality values to nodes and edges in graphs has been widely investigated during last years. Recently, a novel measure of node centrality has been proposed, called k-path centrality index, which is based on the propagation of messages inside a network along paths consisting of at most k edges. On the other hand, the importance of computing the centrality of edges has been put into evidence since 1970's by Anthonisse and, subsequently by Girvan and Newman. In this work we propose the generalization of the concept of k-path centrality by defining the k-path edge centrality, a measure of centrality introduced to compute the importance of edges. We provide an efficient algorithm, running in O(k m), being m the number of edges in the graph. Thus, our technique is feasible for large scale network analysis. Finally, the performance of our algorithm is analyzed, discussing the results obtained against large online social network datasets. △ Less

Submitted 7 March, 2013; originally announced March 2013.

Comments: 28 pages, 5 figures

Journal ref: Knowledge-based Systems, 30:136-150, 2012

arXiv:1303.1741 [pdf, ps, other]

doi 10.1016/j.ins.2012.08.001

Enhancing community detection using a network weighting strategy

Authors: Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, Alessandro Provetti

Abstract: A community within a network is a group of vertices densely connected to each other but less connected to the vertices outside. The problem of detecting communities in large networks plays a key role in a wide range of research areas, e.g. Computer Science, Biology and Sociology. Most of the existing algorithms to find communities count on the topological features of the network and often do not s… ▽ More A community within a network is a group of vertices densely connected to each other but less connected to the vertices outside. The problem of detecting communities in large networks plays a key role in a wide range of research areas, e.g. Computer Science, Biology and Sociology. Most of the existing algorithms to find communities count on the topological features of the network and often do not scale well on large, real-life instances. In this article we propose a strategy to enhance existing community detection algorithms by adding a pre-processing step in which edges are weighted according to their centrality w.r.t. the network topology. In our approach, the centrality of an edge reflects its contribute to making arbitrary graph tranversals, i.e., spreading messages over the network, as short as possible. Our strategy is able to effectively complements information about network topology and it can be used as an additional tool to enhance community detection. The computation of edge centralities is carried out by performing multiple random walks of bounded length on the network. Our method makes the computation of edge centralities feasible also on large-scale networks. It has been tested in conjunction with three state-of-the-art community detection algorithms, namely the Louvain method, COPRA and OSLOM. Experimental results show that our method raises the accuracy of existing algorithms both on synthetic and real-life datasets. △ Less

Submitted 7 March, 2013; originally announced March 2013.

Comments: 28 pages, 2 figures

Journal ref: Information Sciences, 222:648-668, 2013

arXiv:1303.1738 [pdf, other]

doi 10.1016/j.jcss.2013.03.012

Mixing local and global information for community detection in large networks

Authors: Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, Alessandro Provetti

Abstract: The problem of clustering large complex networks plays a key role in several scientific fields ranging from Biology to Sociology and Computer Science. Many approaches to clustering complex networks are based on the idea of maximizing a network modularity function. Some of these approaches can be classified as global because they exploit knowledge about the whole network topology to find clusters.… ▽ More The problem of clustering large complex networks plays a key role in several scientific fields ranging from Biology to Sociology and Computer Science. Many approaches to clustering complex networks are based on the idea of maximizing a network modularity function. Some of these approaches can be classified as global because they exploit knowledge about the whole network topology to find clusters. Other approaches, instead, can be interpreted as local because they require only a partial knowledge of the network topology, e.g., the neighbors of a vertex. Global approaches are able to achieve high values of modularity but they do not scale well on large networks and, therefore, they cannot be applied to analyze on-line social networks like Facebook or YouTube. In contrast, local approaches are fast and scale up to large, real-life networks, at the cost of poorer results than those achieved by local methods. In this article we propose a glocal method to maximizing modularity, i.e., our method uses information at the global level, yet its scalability on large networks is comparable to that of local methods. The proposed method is called COmplex Network CLUster DEtection (or, shortly, CONCLUDE.) It works in two stages: in the first stage it uses an information-propagation model, based on random and non-backtracking walks of finite length, to compute the importance of each edge in keeping the network connected (called edge centrality.) Then, edge centrality is used to map network vertices onto points of an Euclidean space and to compute distances between all pairs of connected vertices. In the second stage, CONCLUDE uses the distances computed in the first stage to partition the network into clusters. CONCLUDE is computationally efficient since in the average case its cost is roughly linear in the number of edges of the network. △ Less

Submitted 16 October, 2013; v1 submitted 7 March, 2013; originally announced March 2013.

Journal ref: Journal of Computer and System Sciences 80(1):72-87, 2014

arXiv:1203.0535 [pdf]

doi 10.1145/2629438

On Facebook, most ties are weak

Authors: Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, Alessandro Provetti

Abstract: Pervasive socio-technical networks bring new conceptual and technological challenges to developers and users alike. A central research theme is evaluation of the intensity of relations linking users and how they facilitate communication and the spread of information. These aspects of human relationships have been studied extensively in the social sciences under the framework of the "strength of we… ▽ More Pervasive socio-technical networks bring new conceptual and technological challenges to developers and users alike. A central research theme is evaluation of the intensity of relations linking users and how they facilitate communication and the spread of information. These aspects of human relationships have been studied extensively in the social sciences under the framework of the "strength of weak ties" theory proposed by Mark Granovetter.13 Some research has considered whether that theory can be extended to online social networks like Facebook, suggesting interaction data can be used to predict the strength of ties. The approaches being used require handling user-generated data that is often not publicly available due to privacy concerns. Here, we propose an alternative definition of weak and strong ties that requires knowledge of only the topology of the social network (such as who is a friend of whom on Facebook), relying on the fact that online social networks, or OSNs, tend to fragment into communities. We thus suggest classifying as weak ties those edges linking individuals belonging to different communities and strong ties as those connecting users in the same community. We tested this definition on a large network representing part of the Facebook social graph and studied how weak and strong ties affect the information-diffusion process. Our findings suggest individuals in OSNs self-organize to create well-connected communities, while weak ties yield cohesion and optimize the coverage of information spread. △ Less

Submitted 1 November, 2014; v1 submitted 2 March, 2012; originally announced March 2012.

Comments: Accepted version of the manuscript before ACM editorial work. Check http://cacm.acm.org/magazines/2014/11/179820-on-facebook-most-ties-are-weak/ for the final version

Journal ref: Communications of the ACM, Vol. 57 No. 11, Pages 78-84, 2014

arXiv:1202.0331 [pdf, other]

doi 10.1685/journal.caim.381

Topological Features of Online Social Networks

Authors: Emilio Ferrara, Giacomo Fiumara

Abstract: The importance of modeling and analyzing Social Networks is a consequence of the success of Online Social Networks during last years. Several models of networks have been proposed, reflecting the different characteristics of Social Networks. Some of them fit better to model specific phenomena, such as the growth and the evolution of the Social Networks; others are more appropriate to capture the t… ▽ More The importance of modeling and analyzing Social Networks is a consequence of the success of Online Social Networks during last years. Several models of networks have been proposed, reflecting the different characteristics of Social Networks. Some of them fit better to model specific phenomena, such as the growth and the evolution of the Social Networks; others are more appropriate to capture the topological characteristics of the networks. Because these networks show unique and different properties and features, in this work we describe and exploit several models in order to capture the structure of popular Online Social Networks, such as Arxiv, Facebook, Wikipedia and YouTube. Our experimentation aims at verifying the structural characteristics of these networks, in order to understand what model better depicts their structure, and to analyze the inner community structure, to illustrate how members of these Online Social Networks interact and group together into smaller communities. △ Less

Submitted 1 February, 2012; originally announced February 2012.

MSC Class: 91D30; 05C82; 68R10; 90B10; 90C35

Journal ref: Communications on Applied and Industrial Mathematics, 2(2):1-20, 2011

arXiv:1109.6698 [pdf, ps, other]

doi 10.1109/ISDA.2011.6121719

Improving Recommendation Quality by Merging Collaborative Filtering and Social Relationships

Authors: Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, Alessandro Provetti

Abstract: Matrix Factorization techniques have been successfully applied to raise the quality of suggestions generated by Collaborative Filtering Systems (CFSs). Traditional CFSs based on Matrix Factorization operate on the ratings provided by users and have been recently extended to incorporate demographic aspects such as age and gender. In this paper we propose to merge CFS based on Matrix Factorization a… ▽ More Matrix Factorization techniques have been successfully applied to raise the quality of suggestions generated by Collaborative Filtering Systems (CFSs). Traditional CFSs based on Matrix Factorization operate on the ratings provided by users and have been recently extended to incorporate demographic aspects such as age and gender. In this paper we propose to merge CFS based on Matrix Factorization and information regarding social friendships in order to provide users with more accurate suggestions and rankings on items of their interest. The proposed approach has been evaluated on a real-life online social network; the experimental results show an improvement against existing CFSs. A detailed comparison with related literature is also present. △ Less

Submitted 29 September, 2011; originally announced September 2011.

Comments: 6 pages, Proceedings of the 11th International Conference on Intelligent Systems Design and Applications

Journal ref: Proceedings of the 11th International Conference on Intelligent Systems Design and Applications, pp. 587-592, 2011

Showing 1–50 of 57 results for author: Ferrara, E