-
Limits of PageRank-based ranking methods in sports data
Authors:
Yuhao Zhou,
Ruijie Wang,
Yi-Cheng Zhang,
An Zeng,
Matúš Medo
Abstract:
While PageRank has been extensively used to rank sport tournament participants (teams or individuals), its superiority over simpler ranking methods has been never clearly demonstrated. We use sports results from 18 major leagues to calibrate a state-of-art model for synthetic sports results. Model data are then used to assess the ranking performance of PageRank in a controlled setting. We find tha…
▽ More
While PageRank has been extensively used to rank sport tournament participants (teams or individuals), its superiority over simpler ranking methods has been never clearly demonstrated. We use sports results from 18 major leagues to calibrate a state-of-art model for synthetic sports results. Model data are then used to assess the ranking performance of PageRank in a controlled setting. We find that PageRank outperforms the benchmark ranking by the number of wins only when a small fraction of all games have been played. Increased randomness in the data, such as intrinsic randomness of outcomes or advantage of home teams, further reduces the range of PageRank's superiority. We propose a new PageRank variant which outperforms PageRank in all evaluated settings, yet shares its sensitivity to increased randomness in the data. Our main findings are confirmed by evaluating the ranking algorithms on real data. Our work demonstrates the danger of using novel metrics and algorithms without considering their limits of applicability.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
The fragility of opinion formation in a complex world
Authors:
Matúš Medo,
Manuel S. Mariani,
Linyuan Lü
Abstract:
With vast amounts of high-quality information at our fingertips, how is it possible that many people believe that the Earth is flat and vaccination harmful? Motivated by this question, we quantify the implications of an opinion formation mechanism whereby an uninformed observer gradually forms opinions about a world composed of subjects interrelated by a signed network of mutual trust and distrust…
▽ More
With vast amounts of high-quality information at our fingertips, how is it possible that many people believe that the Earth is flat and vaccination harmful? Motivated by this question, we quantify the implications of an opinion formation mechanism whereby an uninformed observer gradually forms opinions about a world composed of subjects interrelated by a signed network of mutual trust and distrust. We show numerically and analytically that the observer's resulting opinions are highly inconsistent (they tend to be independent of the observer's initial opinions) and unstable (they exhibit wide stochastic variations). Opinion inconsistency and instability increase with the world complexity represented by the number of subjects, which can be prevented by suitably expanding the observer's initial amount of information. Our findings imply that even an individual who initially trusts credible information sources may end up trusting the deceptive ones if at least a small number of trust relations exist between the credible and deceptive sources.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Time-invariant degree growth in preferential attachment network models
Authors:
Jun Sun,
Matúš Medo,
Steffen Staab
Abstract:
Preferential attachment drives the evolution of many complex networks. Its analytical studies mostly consider the simplest case of a network that grows uniformly in time despite the accelerating growth of many real networks. Motivated by the observation that the average degree growth of nodes is time-invariant in empirical network data, we study the degree dynamics in the relevant class of network…
▽ More
Preferential attachment drives the evolution of many complex networks. Its analytical studies mostly consider the simplest case of a network that grows uniformly in time despite the accelerating growth of many real networks. Motivated by the observation that the average degree growth of nodes is time-invariant in empirical network data, we study the degree dynamics in the relevant class of network models where preferential attachment is combined with heterogeneous node fitness and aging. We propose a novel analytical framework based on the time-invariance of the studied systems and show that it is self-consistent only for two special network growth forms: the uniform and exponential network growth. Conversely, the breaking of such time-invariance explains the winner-takes-all effect in some model settings, revealing the connection between the Bose-Einstein condensation in the Bianconi-Barabási model and similar gelation in superlinear preferential attachment. Aging is necessary to reproduce realistic node degree growth curves and can prevent the winner-takes-all effect under weak conditions. Our results are verified by extensive numerical simulations.
△ Less
Submitted 22 January, 2020;
originally announced January 2020.
-
Simple regularities in the dynamics of online news impact
Authors:
Matúš Medo,
Manuel S. Mariani,
Linyuan Lü
Abstract:
Online news can quickly reach and affect millions of people, yet we do not know yet whether there exist potential dynamical regularities that govern their impact on the public. We use data from two major news outlets, BBC and New York Times, where the number of user comments can be used as a proxy of news impact. We find that the impact dynamics of online news articles does not exhibit popularity…
▽ More
Online news can quickly reach and affect millions of people, yet we do not know yet whether there exist potential dynamical regularities that govern their impact on the public. We use data from two major news outlets, BBC and New York Times, where the number of user comments can be used as a proxy of news impact. We find that the impact dynamics of online news articles does not exhibit popularity patterns found in many other social and information systems. In particular, we find that a simple exponential distribution yields a better fit to the empirical news impact distributions than a power-law distribution. This observation is explained by the lack or limited influence of the otherwise omnipresent rich-get-richer mechanism in the analyzed data. The temporal dynamics of the news impact exhibits a universal exponential decay which allows us to collapse individual news trajectories into an elementary single curve. We also show how daily variations of user activity directly influence the dynamics of the article impact. Our findings challenge the universal applicability of popularity dynamics patterns found in other social contexts.
△ Less
Submitted 22 January, 2021; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data
Authors:
Shuqi Xu,
Manuel Sebastian Mariani,
Linyuan Lü,
Matúš Medo
Abstract:
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metr…
▽ More
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics' ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics' performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
Optimal timescale for community detection in growing networks
Authors:
Matus Medo,
An Zeng,
Yi-Cheng Zhang,
Manuel S. Mariani
Abstract:
Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information…
▽ More
Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information on the system's dynamics. At the same time, time-aware methods require the choice of an observation timescale, yet we lack principled ways to determine it. We focus on the popular community detection problem which aims to partition a network's nodes into meaningful groups. We use a multi-layer quality function to show, on both synthetic and real datasets, that the observation timescale that leads to optimal communities is tightly related to the system's intrinsic aging timescale that can be inferred from the time-stamped network data. The use of temporal information leads to drastically different conclusions on the community structure of real information networks, which challenges the current understanding of the large-scale organization of growing networks. Our findings indicate that before attempting to assess structural patterns of evolving networks, it is vital to uncover the timescales of the dynamical processes that generated them.
△ Less
Submitted 1 August, 2019; v1 submitted 13 September, 2018;
originally announced September 2018.
-
The long-term impact of ranking algorithms in growing networks
Authors:
Shilun Zhang,
Matúš Medo,
Linyuan Lü,
Manuel Sebastian Mariani
Abstract:
When we search online for content, we are constantly exposed to rankings. For example, web search results are presented as a ranking, and online bookstores often show us lists of best-selling books. While popularity-based ranking algorithms (like Google's PageRank) have been extensively studied in previous works, we still lack a clear understanding of their potential systemic consequences. In this…
▽ More
When we search online for content, we are constantly exposed to rankings. For example, web search results are presented as a ranking, and online bookstores often show us lists of best-selling books. While popularity-based ranking algorithms (like Google's PageRank) have been extensively studied in previous works, we still lack a clear understanding of their potential systemic consequences. In this work, we fill this gap by introducing a new model of network growth that allows us to compare the properties of the networks generated under the influence of different ranking algorithms. We show that by correcting for the omnipresent age bias of popularity-based ranking algorithms, the resulting networks exhibit a significantly larger agreement between the nodes' inherent quality and their long-term popularity, and a less concentrated popularity distribution. To further promote popularity diversity, we introduce and validate a perturbation of the original rankings where a small number of randomly-selected nodes are promoted to the top of the ranking. Our findings move the first steps toward a model-based understanding of the long-term impact of popularity-based ranking algorithms, and could be used as an informative tool for the design of improved information filtering tools.
△ Less
Submitted 19 November, 2018; v1 submitted 31 May, 2018;
originally announced May 2018.
-
Early identification of important patents through network centrality
Authors:
Manuel Sebastian Mariani,
Matus Medo,
François Lafond
Abstract:
One of the most challenging problems in technological forecasting is to identify as early as possible those technologies that have the potential to lead to radical changes in our society. In this paper, we use the US patent citation network (1926-2010) to test our ability to early identify a list of historically significant patents through citation network analysis. We show that in order to effect…
▽ More
One of the most challenging problems in technological forecasting is to identify as early as possible those technologies that have the potential to lead to radical changes in our society. In this paper, we use the US patent citation network (1926-2010) to test our ability to early identify a list of historically significant patents through citation network analysis. We show that in order to effectively uncover these patents shortly after they are issued, we need to go beyond raw citation counts and take into account both the citation network topology and temporal information. In particular, an age-normalized measure of patent centrality, called rescaled PageRank, allows us to identify the significant patents earlier than citation count and PageRank score. In addition, we find that while high-impact patents tend to rely on other high-impact patents in a similar way as scientific papers, the patents' citation dynamics is significantly slower than that of papers, which makes the early identification of significant patents more challenging than that of significant papers.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
Ranking in evolving complex networks
Authors:
Hao Liao,
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang,
Ming-Yang Zhou
Abstract:
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allo…
▽ More
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allocated by companies and policymakers, among others. This calls for a deep understanding of how existing ranking algorithms perform, and which are their possible biases that may impair their effectiveness. Well-established ranking algorithms (such as the popular Google's PageRank) are static in nature and, as a consequence, they exhibit important shortcomings when applied to real networks that rapidly evolve in time. The recent advances in the understanding and modeling of evolving networks have enabled the development of a wide and diverse range of ranking algorithms that take the temporal dimension into account. The aim of this review is to survey the existing ranking algorithms, both static and time-aware, and their applications to evolving networks. We emphasize both the impact of network evolution on well-established static algorithms and the benefits from including the temporal dimension for tasks such as prediction of real network traffic, prediction of future links, and identification of highly-significant nodes.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.
-
Quantifying and suppressing ranking bias in a large citation network
Authors:
Giacomo Vaccario,
Matus Medo,
Nicolas Wider,
Manuel Sebastian Mariani
Abstract:
It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relati…
▽ More
It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. We use a large citation dataset from Microsoft Academic Graph and a new statistical framework based on the Mahalanobis distance to show that the rankings by well known indicators, including the relative citation count and Google's PageRank score, are significantly biased by paper field and age. We propose a general normalization procedure motivated by the $z$-score which produces much less biased rankings when applied to citation count and PageRank score.
△ Less
Submitted 23 March, 2017;
originally announced March 2017.
-
Randomizing growing networks with a time-respecting null model
Authors:
Zhuo-Ming Ren,
Manuel Sebastian Mariani,
Yi-Cheng Zhang,
Matus Medo
Abstract:
Complex networks are often used to represent systems that are not static but grow with time: people make new friendships, new papers are published and refer to the existing ones, and so forth. To assess the statistical significance of measurements made on such networks, we propose a randomization methodology---a time-respecting null model---that preserves both the network's degree sequence and the…
▽ More
Complex networks are often used to represent systems that are not static but grow with time: people make new friendships, new papers are published and refer to the existing ones, and so forth. To assess the statistical significance of measurements made on such networks, we propose a randomization methodology---a time-respecting null model---that preserves both the network's degree sequence and the time evolution of individual nodes' degree values. By preserving the temporal linking patterns of the analyzed system, the proposed model is able to factor out the effect of the system's temporal patterns on its structure. We apply the model to the citation network of Physical Review scholarly papers and the citation network of US movies. The model reveals that the two datasets are strikingly different with respect to their degree-degree correlations, and we discuss the important implications of this finding on the information provided by paradigmatic node centrality metrics such as indegree and Google's PageRank. The randomization methodology proposed here can be used to assess the significance of any structural property in growing networks, which could bring new insights into the problems where null models play a critical role, such as the detection of communities and network motifs.
△ Less
Submitted 16 November, 2017; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Identification of milestone papers through time-balanced network centrality
Authors:
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang
Abstract:
Citations between scientific papers and related bibliometric indices, such as the $h$-index for authors and the impact factor for journals, are being increasingly used - often in controversial ways - as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze…
▽ More
Citations between scientific papers and related bibliometric indices, such as the $h$-index for authors and the impact factor for journals, are being increasingly used - often in controversial ways - as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze the network of citations among the $449,935$ papers published by the American Physical Society (APS) journals between 1893 and 2009, and focus on the comparison of metrics built on the citation count with network-based metrics. We contrast five article-level metrics with respect to the rankings that they assign to a set of fundamental papers, called Milestone Letters, carefully selected by the APS editors for "making long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research". A new metric, which combines PageRank centrality with the explicit requirement that paper score is not biased by paper age, is the best-performing metric overall in identifying the Milestone Letters. The lack of time bias in the new metric makes it also possible to use it to compare papers of different age on the same scale. We find that network-based metrics identify the Milestone Letters better than metrics based on the citation count, which suggests that the structure of the citation network contains information that can be used to improve the ranking of scientific publications. The methods and results presented here are relevant for all evolving systems where network centrality metrics are applied, for example the World Wide Web and online social networks. An interactive Web platform where it is possible to view the ranking of the APS papers by rescaled PageRank is available at the address \url{http://www.sciencenow.info}.
△ Less
Submitted 8 November, 2016; v1 submitted 30 August, 2016;
originally announced August 2016.
-
The essential role of time in network-based recommendation
Authors:
Alexandre Vidmer,
Matus Medo
Abstract:
Random walks on bipartite networks have been used extensively to design personalized recommendation methods. While aging has been identified as a key component in the growth of information networks, most research has focused on the networks' structural properties and neglected the often available time information. Time has been largely ignored both by the investigated recommendation methods as wel…
▽ More
Random walks on bipartite networks have been used extensively to design personalized recommendation methods. While aging has been identified as a key component in the growth of information networks, most research has focused on the networks' structural properties and neglected the often available time information. Time has been largely ignored both by the investigated recommendation methods as well as by the methodology used to evaluate them. We show that this time-unaware approach overestimates the methods' recommendation performance. Motivated by microscopic rules of network growth, we propose a time-aware modification of an existing recommendation method and show that by combining the temporal and structural aspects, it outperforms the existing methods. The performance improvements are particularly striking in systems with fast aging.
△ Less
Submitted 15 June, 2016;
originally announced June 2016.
-
Model-based evaluation of scientific impact indicators
Authors:
Matus Medo,
Giulio Cimini
Abstract:
Using bibliometric data artificially generated through a model of citation dynamics calibrated on empirical data, we compare several indicators for the scientific impact of individual researchers. The use of such a controlled setup has the advantage of avoiding the biases present in real databases, and allows us to assess which aspects of the model dynamics and which traits of individual researche…
▽ More
Using bibliometric data artificially generated through a model of citation dynamics calibrated on empirical data, we compare several indicators for the scientific impact of individual researchers. The use of such a controlled setup has the advantage of avoiding the biases present in real databases, and allows us to assess which aspects of the model dynamics and which traits of individual researchers a particular indicator actually reflects. We find that the simple citation average performs well in capturing the intrinsic scientific ability of researchers, whatever the length of their career. On the other hand, when productivity complements ability in the evaluation process, the notorious $h$ and $g$ indices reveal their potential, yet their normalized variants do not always yield a fair comparison between researchers at different career stages. Notably, the use of logarithmic units for citation counts allows us to build simple indicators with performance equal to that of $h$ and $g$. Our analysis may provide useful hints for a proper use of bibliometric indicators. Additionally, our framework can be extended by including other aspects of the scientific production process and citation dynamics, with the potential to become a standard tool for the assessment of impact metrics.
△ Less
Submitted 14 June, 2016;
originally announced June 2016.
-
Network-based recommendation algorithms: A review
Authors:
Fei Yu,
An Zeng,
Sebastien Gillard,
Matus Medo
Abstract:
Recommender systems are a vital tool that helps us to overcome the information overload problem. They are being used by most e-commerce web sites and attract the interest of a broad scientific community. A recommender system uses data on users' past preferences to choose new items that might be appreciated by a given individual user. While many approaches to recommendation exist, the approach base…
▽ More
Recommender systems are a vital tool that helps us to overcome the information overload problem. They are being used by most e-commerce web sites and attract the interest of a broad scientific community. A recommender system uses data on users' past preferences to choose new items that might be appreciated by a given individual user. While many approaches to recommendation exist, the approach based on a network representation of the input data has gained considerable attention in the past. We review here a broad range of network-based recommendation algorithms and for the first time compare their performance on three distinct real datasets. We present recommendation topics that go beyond the mere question of which algorithm to use - such as the possible influence of recommendation on the evolution of systems that use it - and finally discuss open research directions and challenges.
△ Less
Submitted 19 November, 2015;
originally announced November 2015.
-
Prediction in complex systems: the case of the international trade network
Authors:
Alexandre Vidmer,
An Zeng,
Matúš Medo,
Yi-Cheng Zhang
Abstract:
Predicting the future evolution of complex systems is one of the main challenges in complexity science. Based on a current snapshot of a network, link prediction algorithms aim to predict its future evolution. We apply here link prediction algorithms to data on the international trade between countries. This data can be represented as a complex network where links connect countries with the produc…
▽ More
Predicting the future evolution of complex systems is one of the main challenges in complexity science. Based on a current snapshot of a network, link prediction algorithms aim to predict its future evolution. We apply here link prediction algorithms to data on the international trade between countries. This data can be represented as a complex network where links connect countries with the products that they export. Link prediction techniques based on heat and mass diffusion processes are employed to obtain predictions for products exported in the future. These baseline predictions are improved using a recent metric of country fitness and product similarity. The overall best results are achieved with a newly developed metric of product similarity which takes advantage of causality in the network evolution.
△ Less
Submitted 17 November, 2015;
originally announced November 2015.
-
Identification and modeling of discoverers in online social systems
Authors:
Matus Medo,
Manuel S. Mariani,
An Zeng,
Yi-Cheng Zhang
Abstract:
The dynamics of individuals is of essential importance for understanding the evolution of social systems. Most existing models assume that individuals in diverse systems, ranging from social networks to e-commerce, all tend to what is already popular. We develop an analytical time-aware framework which shows that when individuals make choices -- which item to buy, for example -- in online social s…
▽ More
The dynamics of individuals is of essential importance for understanding the evolution of social systems. Most existing models assume that individuals in diverse systems, ranging from social networks to e-commerce, all tend to what is already popular. We develop an analytical time-aware framework which shows that when individuals make choices -- which item to buy, for example -- in online social systems, a small fraction of them is consistently successful in discovering popular items long before they actually become popular. We argue that these users, whom we refer to as discoverers, are fundamentally different from the previously known opinion leaders, influentials, and innovators. We use the proposed framework to demonstrate that discoverers are present in a wide range of systems. Once identified, they can be used to predict the future success of items. We propose a network model which reproduces the discovery patterns observed in the real data. Furthermore, data produced by the model pose a fundamental challenge to classical ranking algorithms which neglect the time of link creation and thus fail to discriminate between discoverers and ordinary users in the data. Our results open the door to qualitative and quantitative study of fine temporal patterns in social systems and have far-reaching implications for network modeling and algorithm design.
△ Less
Submitted 4 September, 2015;
originally announced September 2015.
-
Ranking nodes in growing networks: When PageRank fails
Authors:
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang
Abstract:
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's perf…
▽ More
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's performance on a network model supported by real data, and show that realistic temporal effects make PageRank fail in individuating the most valuable nodes for a broad range of model parameters. Results on real data are in qualitative agreement with our model-based findings. This failure of PageRank reveals that the static approach to information filtering is inappropriate for a broad class of growing systems, and suggest that time-dependent algorithms that are based on the temporal linking patterns of these systems are needed to better rank the nodes.
△ Less
Submitted 3 September, 2015;
originally announced September 2015.
-
Modeling mutual feedback between users and recommender systems
Authors:
An Zeng,
Chi Ho Yeung,
Matus Medo,
Yi-Cheng Zhang
Abstract:
Recommender systems daily influence our decisions on the Internet. While considerable attention has been given to issues such as recommendation accuracy and user privacy, the long-term mutual feedback between a recommender system and the decisions of its users has been neglected so far. We propose here a model of network evolution which allows us to study the complex dynamics induced by this feedb…
▽ More
Recommender systems daily influence our decisions on the Internet. While considerable attention has been given to issues such as recommendation accuracy and user privacy, the long-term mutual feedback between a recommender system and the decisions of its users has been neglected so far. We propose here a model of network evolution which allows us to study the complex dynamics induced by this feedback, including the hysteresis effect which is typical for systems with non-linear dynamics. Despite the popular belief that recommendation helps users to discover new things, we find that the long-term use of recommendation can contribute to the rise of extremely popular items and thus ultimately narrow the user choice. These results are supported by measurements of the time evolution of item popularity inequality in real systems. We show that this adverse effect of recommendation can be tamed by sacrificing part of short-term recommendation accuracy.
△ Less
Submitted 7 August, 2015;
originally announced August 2015.
-
Ranking users, papers and authors in online scientific communities
Authors:
Hao Liao,
Rui Xiao,
Giulio Cimini,
Matus Medo
Abstract:
The ever-increasing quantity and complexity of scientific production have made it difficult for researchers to keep track of advances in their own fields. This, together with growing popularity of online scientific communities, calls for the development of effective information filtering tools. We propose here a method to simultaneously compute reputation of users and quality of scientific artifac…
▽ More
The ever-increasing quantity and complexity of scientific production have made it difficult for researchers to keep track of advances in their own fields. This, together with growing popularity of online scientific communities, calls for the development of effective information filtering tools. We propose here a method to simultaneously compute reputation of users and quality of scientific artifacts in an online scientific community. Evaluation on artificially-generated data and real data from the Econophysics Forum is used to determine the method's best-performing variants. We show that when the method is extended by considering author credit, its performance improves on multiple levels. In particular, top papers have higher citation count and top authors have higher $h$-index than top papers and top authors chosen by other algorithms.
△ Less
Submitted 9 May, 2014; v1 submitted 13 November, 2013;
originally announced November 2013.
-
Statistical validation of high-dimensional models of growing networks
Authors:
Matus Medo
Abstract:
The abundance of models of complex networks and the current insufficient validation standards make it difficult to judge which models are strongly supported by data and which are not. We focus here on likelihood maximization methods for models of growing networks with many parameters and compare their performance on artificial and real datasets. While high dimensionality of the parameter space har…
▽ More
The abundance of models of complex networks and the current insufficient validation standards make it difficult to judge which models are strongly supported by data and which are not. We focus here on likelihood maximization methods for models of growing networks with many parameters and compare their performance on artificial and real datasets. While high dimensionality of the parameter space harms the performance of direct likelihood maximization on artificial data, this can be improved by introducing a suitable penalization term. Likelihood maximization on real data shows that the presented approach is able to discriminate among available network models. To make large-scale datasets accessible to this kind of analysis, we propose a subset sampling technique and show that it yields substantial model evidence in a fraction of time necessary for the analysis of the complete data.
△ Less
Submitted 30 January, 2014; v1 submitted 8 November, 2013;
originally announced November 2013.
-
Information filtering via hybridization of similarity preferential diffusion processes
Authors:
An Zeng,
Alexandre Vidmer,
Matus Medo,
Yi-Cheng Zhang
Abstract:
The recommender system is one of the most promising ways to address the information overload problem in online systems. Based on the personal historical record, the recommender system can find interesting and relevant objects for the user within a huge information space. Many physical processes such as the mass diffusion and heat conduction have been applied to design the recommendation algorithms…
▽ More
The recommender system is one of the most promising ways to address the information overload problem in online systems. Based on the personal historical record, the recommender system can find interesting and relevant objects for the user within a huge information space. Many physical processes such as the mass diffusion and heat conduction have been applied to design the recommendation algorithms. The hybridization of these two algorithms has been shown to provide both accurate and diverse recommendation results. In this paper, we proposed two similarity preferential diffusion processes. Extensive experimental analyses on two benchmark data sets demonstrate that both recommendation and accuracy and diversity are improved duet to the similarity preference in the diffusion. The hybridization of the similarity preferential diffusion processes is shown to significantly outperform the state-of-art recommendation algorithm. Finally, our analysis on network sparsity show that there is significant difference between dense and sparse system, indicating that all the former conclusions on recommendation in the literature should be reexamined in sparse system.
△ Less
Submitted 31 August, 2013;
originally announced September 2013.
-
The effect of the initial network configuration on preferential attachment
Authors:
Yves Berset,
Matus Medo
Abstract:
The classical preferential attachment model is sensitive to the choice of the initial configuration of the network. As the number of initial nodes and their degree grow, so does the time needed for an equilibrium degree distribution to be established. We study this phenomenon, provide estimates of the equilibration time, and characterize the degree distribution cutoff observed at finite times. Whe…
▽ More
The classical preferential attachment model is sensitive to the choice of the initial configuration of the network. As the number of initial nodes and their degree grow, so does the time needed for an equilibrium degree distribution to be established. We study this phenomenon, provide estimates of the equilibration time, and characterize the degree distribution cutoff observed at finite times. When the initial network is dense and exceeds a certain small size, there is no equilibration and a suitable statistical test can always discern the produced degree distribution from the equilibrium one. As a by-product, the weighted Kolmogorov-Smirnov statistic is demonstrated to be more suitable for statistical analysis of power-law distributions with cutoff when the data is ample.
△ Less
Submitted 1 May, 2013;
originally announced May 2013.
-
Trend prediction in temporal bipartite networks: the case of Movielens, Netflix, and Digg
Authors:
An Zeng,
Stanislao Gualdi,
Matus Medo,
Yi-Cheng Zhang
Abstract:
Online systems where users purchase or collect items of some kind can be effectively represented by temporal bipartite networks where both nodes and links are added with time. We use this representation to predict which items might become popular in the near future. Various prediction methods are evaluated on three distinct datasets originating from popular online services (Movielens, Netflix, and…
▽ More
Online systems where users purchase or collect items of some kind can be effectively represented by temporal bipartite networks where both nodes and links are added with time. We use this representation to predict which items might become popular in the near future. Various prediction methods are evaluated on three distinct datasets originating from popular online services (Movielens, Netflix, and Digg). We show that the prediction performance can be further enhanced if the user social network is known and centrality of individual users in this network is used to weight their actions.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
The role of taste affinity in agent-based models for social recommendation
Authors:
Giulio Cimini,
An Zeng,
Matus Medo,
Duanbing Chen
Abstract:
In the Internet era, online social media emerged as the main tool for sharing opinions and information among individuals. In this work we study an adaptive model of a social network where directed links connect users with similar tastes, and over which information propagates through social recommendation. Agent-based simulations of two different artificial settings for modeling user tastes are com…
▽ More
In the Internet era, online social media emerged as the main tool for sharing opinions and information among individuals. In this work we study an adaptive model of a social network where directed links connect users with similar tastes, and over which information propagates through social recommendation. Agent-based simulations of two different artificial settings for modeling user tastes are compared with patterns seen in real data, suggesting that users differing in their scope of interests is a more realistic assumption than users differing only in their particular interests. We further introduce an extensive set of similarity metrics based on users' past assessments, and evaluate their use in the given social recommendation model with both artificial simulations and real data. Superior recommendation performance is observed for similarity metrics that give preference to users with small scope---who thus act as selective filters in social recommendation.
△ Less
Submitted 18 January, 2013;
originally announced January 2013.
-
Crowd Avoidance and Diversity in Socio-Economic Systems and Recommendation
Authors:
Stanislao Gualdi,
Matus Medo,
Yi-Cheng Zhang
Abstract:
Recommender systems recommend objects regardless of potential adverse effects of their overcrowding. We address this shortcoming by introducing crowd-avoiding recommendation where each object can be shared by only a limited number of users or where object utility diminishes with the number of users sharing it. We use real data to show that contrary to expectations, the introduction of these constr…
▽ More
Recommender systems recommend objects regardless of potential adverse effects of their overcrowding. We address this shortcoming by introducing crowd-avoiding recommendation where each object can be shared by only a limited number of users or where object utility diminishes with the number of users sharing it. We use real data to show that contrary to expectations, the introduction of these constraints enhances recommendation accuracy and diversity even in systems where overcrowding is not detrimental. The observed accuracy improvements are explained in terms of removing potential bias of the recommendation method. We finally propose a way to model artificial socio-economic systems with crowd avoidance and obtain first analytical results.
△ Less
Submitted 9 January, 2013;
originally announced January 2013.
-
Network-based information filtering algorithms: ranking and recommendation
Authors:
Matus Medo
Abstract:
After the Internet and the World Wide Web have become popular and widely-available, the electronically stored online interactions of individuals have fast emerged as a challenge for researchers and, perhaps even faster, as a source of valuable information for entrepreneurs. We now have detailed records of informal friendship relations in social networks, purchases on e-commerce sites, various sort…
▽ More
After the Internet and the World Wide Web have become popular and widely-available, the electronically stored online interactions of individuals have fast emerged as a challenge for researchers and, perhaps even faster, as a source of valuable information for entrepreneurs. We now have detailed records of informal friendship relations in social networks, purchases on e-commerce sites, various sorts of information being sent from one user to another, online collections of web bookmarks, and many other data sets that allow us to pose questions that are of interest from both academical and commercial point of view. For example, which other users of a social network you might want to be friend with? Which other items you might be interested to purchase? Who are the most influential users in a network? Which web page you might want to visit next? All these questions are not only interesting per se but the answers to them may help entrepreneurs provide better service to their customers and, ultimately, increase their profits.
△ Less
Submitted 22 August, 2012;
originally announced August 2012.
-
Measuring quality, reputation and trust in online communities
Authors:
Hao Liao,
Giulio Cimini,
Matus Medo
Abstract:
In the Internet era the information overload and the challenge to detect quality content has raised the issue of how to rank both resources and users in online communities. In this paper we develop a general ranking method that can simultaneously evaluate users' reputation and objects' quality in an iterative procedure, and that exploits the trust relationships and social acquaintances of users as…
▽ More
In the Internet era the information overload and the challenge to detect quality content has raised the issue of how to rank both resources and users in online communities. In this paper we develop a general ranking method that can simultaneously evaluate users' reputation and objects' quality in an iterative procedure, and that exploits the trust relationships and social acquaintances of users as an additional source of information. We test our method on two real online communities, the EconoPhysics forum and the Last.fm music catalogue, and determine how different variants of the algorithm influence the resultant ranking. We show the benefits of considering trust relationships, and define the form of the algorithm better apt to common situations.
△ Less
Submitted 22 August, 2012; v1 submitted 20 August, 2012;
originally announced August 2012.
-
Recommendation systems in the scope of opinion formation: a model
Authors:
Marcel Blattner,
Matus Medo
Abstract:
Aggregated data in real world recommender applications often feature fat-tailed distributions of the number of times individual items have been rated or favored. We propose a model to simulate such data. The model is mainly based on social interactions and opinion formation taking place on a complex network with a given topology. A threshold mechanism is used to govern the decision making process…
▽ More
Aggregated data in real world recommender applications often feature fat-tailed distributions of the number of times individual items have been rated or favored. We propose a model to simulate such data. The model is mainly based on social interactions and opinion formation taking place on a complex network with a given topology. A threshold mechanism is used to govern the decision making process that determines whether a user is or is not interested in an item. We demonstrate the validity of the model by fitting attendance distributions from different real data sets. The model is mathematically analyzed by investigating its master equation. Our approach provides an attempt to understand recommender system's data as a social process. The model can serve as a starting point to generate artificial data sets useful for testing and evaluating recommender systems.
△ Less
Submitted 13 August, 2012; v1 submitted 14 June, 2012;
originally announced June 2012.
-
Recommender Systems
Authors:
Linyuan Lü,
Matus Medo,
Chi Ho Yeung,
Yi-Cheng Zhang,
Zi-Ke Zhang,
Tao Zhou
Abstract:
The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification an…
▽ More
The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.
-
Temporal effects in the growth of networks
Authors:
Matus Medo,
Giulio Cimini,
Stanislao Gualdi
Abstract:
We show that to explain the growth of the citation network by preferential attachment (PA), one has to accept that individual nodes exhibit heterogeneous fitness values that decay with time. While previous PA-based models assumed either heterogeneity or decay in isolation, we propose a simple analytically treatable model that combines these two factors. Depending on the input assumptions, the resu…
▽ More
We show that to explain the growth of the citation network by preferential attachment (PA), one has to accept that individual nodes exhibit heterogeneous fitness values that decay with time. While previous PA-based models assumed either heterogeneity or decay in isolation, we propose a simple analytically treatable model that combines these two factors. Depending on the input assumptions, the resulting degree distribution shows an exponential, log-normal or power-law decay, which makes the model an apt candidate for modeling a wide range of real systems.
△ Less
Submitted 26 September, 2011;
originally announced September 2011.
-
Influence, originality and similarity in directed acyclic graphs
Authors:
Stanislao Gualdi,
Matus Medo,
Yi-Cheng Zhang
Abstract:
We introduce a framework for network analysis based on random walks on directed acyclic graphs where the probability of passing through a given node is the key ingredient. We illustrate its use in evaluating the mutual influence of nodes and discovering seminal papers in a citation network. We further introduce a new similarity metric and test it in a simple personalized recommendation process. Th…
▽ More
We introduce a framework for network analysis based on random walks on directed acyclic graphs where the probability of passing through a given node is the key ingredient. We illustrate its use in evaluating the mutual influence of nodes and discovering seminal papers in a citation network. We further introduce a new similarity metric and test it in a simple personalized recommendation process. This metric's performance is comparable to that of classical similarity metrics, thus further supporting the validity of our framework.
△ Less
Submitted 18 August, 2011;
originally announced August 2011.
-
Enhancing topology adaptation in information-sharing social networks
Authors:
Giulio Cimini,
Duanbing Chen,
Matus Medo,
Linyuan Lu,
Yi-Cheng Zhang,
Tao Zhou
Abstract:
The advent of Internet and World Wide Web has led to unprecedent growth of the information available. People usually face the information overload by following a limited number of sources which best fit their interests. It has thus become important to address issues like who gets followed and how to allow people to discover new and better information sources. In this paper we conduct an empirical…
▽ More
The advent of Internet and World Wide Web has led to unprecedent growth of the information available. People usually face the information overload by following a limited number of sources which best fit their interests. It has thus become important to address issues like who gets followed and how to allow people to discover new and better information sources. In this paper we conduct an empirical analysis on different on-line social networking sites, and draw inspiration from its results to present different source selection strategies in an adaptive model for social recommendation. We show that local search rules which enhance the typical topological features of real social communities give rise to network configurations that are globally optimal. These rules create networks which are effective in information diffusion and resemble structures resulting from real social systems.
△ Less
Submitted 13 April, 2012; v1 submitted 22 July, 2011;
originally announced July 2011.
-
Emergence of scale-free leadership structure in social recommender systems
Authors:
Tao Zhou,
Matus Medo,
Giulio Cimini,
Zi-Ke Zhang,
Yi-Cheng Zhang
Abstract:
The study of the organization of social networks is important for understanding of opinion formation, rumor spreading, and the emergence of trends and fashion. This paper reports empirical analysis of networks extracted from four leading sites with social functionality (Delicious, Flickr, Twitter and YouTube) and shows that they all display a scale-free leadership structure. To reproduce this feat…
▽ More
The study of the organization of social networks is important for understanding of opinion formation, rumor spreading, and the emergence of trends and fashion. This paper reports empirical analysis of networks extracted from four leading sites with social functionality (Delicious, Flickr, Twitter and YouTube) and shows that they all display a scale-free leadership structure. To reproduce this feature, we propose an adaptive network model driven by social recommending. Artificial agent-based simulations of this model highlight a "good get richer" mechanism where users with broad interests and good judgments are likely to become popular leaders for the others. Simulations also indicate that the studied social recommendation mechanism can gradually improve the user experience by adapting to tastes of its users. Finally we outline implications for real online resource-sharing systems.
△ Less
Submitted 28 April, 2011; v1 submitted 26 March, 2011;
originally announced March 2011.
-
Heterogeneity, quality, and reputation in an adaptive recommendation model
Authors:
Giulio Cimini,
Matus Medo,
Tao Zhou,
Dong Wei,
Yi-Cheng Zhang
Abstract:
Recommender systems help people cope with the problem of information overload. A recently proposed adaptive news recommender model [Medo et al., 2009] is based on epidemic-like spreading of news in a social network. By means of agent-based simulations we study a "good get richer" feature of the model and determine which attributes are necessary for a user to play a leading role in the network. We…
▽ More
Recommender systems help people cope with the problem of information overload. A recently proposed adaptive news recommender model [Medo et al., 2009] is based on epidemic-like spreading of news in a social network. By means of agent-based simulations we study a "good get richer" feature of the model and determine which attributes are necessary for a user to play a leading role in the network. We further investigate the filtering efficiency of the model as well as its robustness against malicious and spamming behaviour. We show that incorporating user reputation in the recommendation process can substantially improve the outcome.
△ Less
Submitted 6 December, 2010;
originally announced December 2010.
-
The effect of discrete vs. continuous-valued ratings on reputation and ranking systems
Authors:
Matus Medo,
Joseph Rushton Wakeling
Abstract:
When users rate objects, a sophisticated algorithm that takes into account ability or reputation may produce a fairer or more accurate aggregation of ratings than the straightforward arithmetic average. Recently a number of authors have proposed different co-determination algorithms where estimates of user and object reputation are refined iteratively together, permitting accurate measures of both…
▽ More
When users rate objects, a sophisticated algorithm that takes into account ability or reputation may produce a fairer or more accurate aggregation of ratings than the straightforward arithmetic average. Recently a number of authors have proposed different co-determination algorithms where estimates of user and object reputation are refined iteratively together, permitting accurate measures of both to be derived directly from the rating data. However, simulations demonstrating these methods' efficacy assumed a continuum of rating values, consistent with typical physical modelling practice, whereas in most actual rating systems only a limited range of discrete values (such as a 5-star system) is employed. We perform a comparative test of several co-determination algorithms with different scales of discrete ratings and show that this seemingly minor modification in fact has a significant impact on algorithms' performance. Paradoxically, where rating resolution is low, increased noise in users' ratings may even improve the overall performance of the system.
△ Less
Submitted 12 August, 2010; v1 submitted 21 January, 2010;
originally announced January 2010.
-
Building reputation systems for better ranking
Authors:
Luo-Luo Jiang,
Matus Medo,
Joseph R. Wakeling,
Yi-Cheng Zhang,
Tao Zhou
Abstract:
How to rank web pages, scientists and online resources has recently attracted increasing attention from both physicists and computer scientists. In this paper, we study the ranking problem of rating systems where users vote objects by discrete ratings. We propose an algorithm that can simultaneously evaluate the user reputation and object quality in an iterative refinement way. According to both…
▽ More
How to rank web pages, scientists and online resources has recently attracted increasing attention from both physicists and computer scientists. In this paper, we study the ranking problem of rating systems where users vote objects by discrete ratings. We propose an algorithm that can simultaneously evaluate the user reputation and object quality in an iterative refinement way. According to both the artificially generated data and the real data from MovieLens and Amazon, our algorithm can considerably enhance the ranking accuracy. This work highlights the significance of reputation systems in the Internet era and points out a way to evaluate and compare the performances of different reputation systems.
△ Less
Submitted 13 January, 2010;
originally announced January 2010.
-
Adaptive model for recommendation of news
Authors:
Matus Medo,
Yi-Cheng Zhang,
Tao Zhou
Abstract:
Most news recommender systems try to identify users' interests and news' attributes and use them to obtain recommendations. Here we propose an adaptive model which combines similarities in users' rating patterns with epidemic-like spreading of news on an evolving network. We study the model by computer agent-based simulations, measure its performance and discuss its robustness against bias and m…
▽ More
Most news recommender systems try to identify users' interests and news' attributes and use them to obtain recommendations. Here we propose an adaptive model which combines similarities in users' rating patterns with epidemic-like spreading of news on an evolving network. We study the model by computer agent-based simulations, measure its performance and discuss its robustness against bias and malicious behavior. Subject to the approval fraction of news recommended, the proposed model outperforms the widely adopted recommendation of news according to their absolute or relative popularity. This model provides a general social mechanism for recommender systems and may find its applications also in other types of recommendation.
△ Less
Submitted 23 October, 2009; v1 submitted 19 October, 2009;
originally announced October 2009.
-
Solving the apparent diversity-accuracy dilemma of recommender systems
Authors:
Tao Zhou,
Zoltan Kuscsik,
Jian-Guo Liu,
Matus Medo,
Joseph R. Wakeling,
Yi-Cheng Zhang
Abstract:
Recommender systems use data on past user preferences to predict possible future likes and interests. A key challenge is that while the most useful individual recommendations are to be found among diverse niche objects, the most reliably accurate results are obtained by methods that recommend objects based on user or object similarity. In this paper we introduce a new algorithm specifically to add…
▽ More
Recommender systems use data on past user preferences to predict possible future likes and interests. A key challenge is that while the most useful individual recommendations are to be found among diverse niche objects, the most reliably accurate results are obtained by methods that recommend objects based on user or object similarity. In this paper we introduce a new algorithm specifically to address the challenge of diversity and show how it can be used to resolve this apparent dilemma when combined in an elegant hybrid with an accuracy-focused algorithm. By tuning the hybrid appropriately we are able to obtain, without relying on any semantic or context-specific information, simultaneous gains in both accuracy and diversity of recommendations.
△ Less
Submitted 12 March, 2010; v1 submitted 19 August, 2008;
originally announced August 2008.
-
Recommendation model based on opinion diffusion
Authors:
Yi-Cheng Zhang,
Matus Medo,
Jie Ren,
Tao Zhou,
Tao Li,
Fan Yang
Abstract:
Information overload in the modern society calls for highly efficient recommendation algorithms. In this letter we present a novel diffusion based recommendation model, with users' ratings built into a transition matrix. To speed up computation we introduce a Green function method. The numerical tests on a benchmark database show that our prediction is superior to the standard recommendation met…
▽ More
Information overload in the modern society calls for highly efficient recommendation algorithms. In this letter we present a novel diffusion based recommendation model, with users' ratings built into a transition matrix. To speed up computation we introduce a Green function method. The numerical tests on a benchmark database show that our prediction is superior to the standard recommendation methods.
△ Less
Submitted 15 November, 2007; v1 submitted 11 October, 2007;
originally announced October 2007.