Search | arXiv e-print repository

Predicting soccer matches with complex networks and machine learning

Authors: Eduardo Alves Baratela, Felipe Jordão Xavier, Thomas Peron, Paulino Ribeiro Villas-Boas, Francisco Aparecido Rodrigues

Abstract: Soccer attracts the attention of many researchers and professionals in the sports industry. Therefore, the incorporation of science into the sport is constantly growing, with increasing investments in performance analysis and sports prediction industries. This study aims to (i) highlight the use of complex networks as an alternative tool for predicting soccer match outcomes, and (ii) show how the… ▽ More Soccer attracts the attention of many researchers and professionals in the sports industry. Therefore, the incorporation of science into the sport is constantly growing, with increasing investments in performance analysis and sports prediction industries. This study aims to (i) highlight the use of complex networks as an alternative tool for predicting soccer match outcomes, and (ii) show how the combination of structural analysis of passing networks with match statistical data can provide deeper insights into the game patterns and strategies used by teams. In order to do so, complex network metrics and match statistics were used to build machine learning models that predict the wins and losses of soccer teams in different leagues. The results showed that models based on passing networks were as effective as ``traditional'' models, which use general match statistics. Another finding was that by combining both approaches, more accurate models were obtained than when they were used separately, demonstrating that the fusion of such approaches can offer a deeper understanding of game patterns, allowing the comprehension of tactics employed by teams relationships between players, their positions, and interactions during matches. It is worth mentioning that both network metrics and match statistics were important and impactful for the mixed model. Furthermore, the use of networks with a lower granularity of temporal evolution (such as creating a network for each half of the match) performed better than a single network for the entire game. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: To appear in Journal of Complex Networks

arXiv:2311.11200 [pdf, other]

Beyond the Power Law: Estimation, Goodness-of-Fit, and a Semiparametric Extension in Complex Networks

Authors: Nixon Jerez-Lillo, Francisco A. Rodrigues, Paulo H. Ferreira, Pedro L. Ramos

Abstract: Scale-free networks play a fundamental role in the study of complex networks and various applied fields due to their ability to model a wide range of real-world systems. A key characteristic of these networks is their degree distribution, which often follows a power-law distribution, where the probability mass function is proportional to $x^{-α}$, with $α$ typically ranging between $2 < α< 3$. In… ▽ More Scale-free networks play a fundamental role in the study of complex networks and various applied fields due to their ability to model a wide range of real-world systems. A key characteristic of these networks is their degree distribution, which often follows a power-law distribution, where the probability mass function is proportional to $x^{-α}$, with $α$ typically ranging between $2 < α< 3$. In this paper, we introduce Bayesian inference methods to obtain more accurate estimates than those obtained using traditional methods, which often yield biased estimates, and precise credible intervals. Through a simulation study, we demonstrate that our approach provides nearly unbiased estimates for the scaling parameter, enhancing the reliability of inferences. We also evaluate new goodness-of-fit tests to improve the effectiveness of the Kolmogorov-Smirnov test, commonly used for this purpose. Our findings show that the Watson test offers superior power while maintaining a controlled type I error rate, enabling us to better determine whether data adheres to a power-law distribution. Finally, we propose a piecewise extension of this model to provide greater flexibility, evaluating the estimation and its goodness-of-fit features as well. In the complex networks field, this extension allows us to model the full degree distribution, instead of just focusing on the tail, as is commonly done. We demonstrate the utility of these novel methods through applications to two real-world datasets, showcasing their practical relevance and potential to advance the analysis of power-law behavior. △ Less

Submitted 12 January, 2025; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: 33 pages, 11 figures

arXiv:2310.10368 [pdf, other]

Machine learning in physics: a short guide

Authors: Francisco A. Rodrigues

Abstract: Machine learning is a rapidly growing field with the potential to revolutionize many areas of science, including physics. This review provides a brief overview of machine learning in physics, covering the main concepts of supervised, unsupervised, and reinforcement learning, as well as more specialized topics such as causal inference, symbolic regression, and deep learning. We present some of the… ▽ More Machine learning is a rapidly growing field with the potential to revolutionize many areas of science, including physics. This review provides a brief overview of machine learning in physics, covering the main concepts of supervised, unsupervised, and reinforcement learning, as well as more specialized topics such as causal inference, symbolic regression, and deep learning. We present some of the principal applications of machine learning in physics and discuss the associated challenges and perspectives. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 8 pages, 1 figure. Europhysics Letters (EPL), 2023

arXiv:2310.09131 [pdf, other]

Machine learning-based prediction of Q-voter model in complex networks

Authors: Aruane M. Pineda, Paul Kent, Colm Connaughton, Francisco A. Rodrigues

Abstract: In this article, we consider machine learning algorithms to accurately predict two variables associated with the $Q$-voter model in complex networks, i.e., (i) the consensus time and (ii) the frequency of opinion changes. Leveraging nine topological measures of the underlying networks, we verify that the clustering coefficient (C) and information centrality (IC) emerge as the most important predic… ▽ More In this article, we consider machine learning algorithms to accurately predict two variables associated with the $Q$-voter model in complex networks, i.e., (i) the consensus time and (ii) the frequency of opinion changes. Leveraging nine topological measures of the underlying networks, we verify that the clustering coefficient (C) and information centrality (IC) emerge as the most important predictors for these outcomes. Notably, the machine learning algorithms demonstrate accuracy across three distinct initialization methods of the $Q$-voter model, including random selection and the involvement of high- and low-degree agents with positive opinions. By unraveling the intricate interplay between network structure and dynamics, this research sheds light on the underlying mechanisms responsible for polarization effects and other dynamic patterns in social systems. Adopting a holistic approach that comprehends the complexity of network systems, this study offers insights into the intricate dynamics associated with polarization effects and paves the way for investigating the structure and dynamics of complex systems through modern machine learning methods. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 32 pages, 10 figures

Journal ref: Journal of Statistical Mechanics: Theory and Experiment (JSTAT), 2023

arXiv:2306.13200 [pdf, other]

Improving Log-Cumulant Based Estimation of Roughness Information in SAR imagery

Authors: Jeova Farias Sales Rocha Neto, Francisco Alixandre Avila Rodrigues

Abstract: Synthetic Aperture Radar (SAR) image understanding is crucial in remote sensing applications, but it is hindered by its intrinsic noise contamination, called speckle. Sophisticated statistical models, such as the $\mathcal{G}^0$ family of distributions, have been employed to SAR data and many of the current advancements in processing this imagery have been accomplished through extracting informati… ▽ More Synthetic Aperture Radar (SAR) image understanding is crucial in remote sensing applications, but it is hindered by its intrinsic noise contamination, called speckle. Sophisticated statistical models, such as the $\mathcal{G}^0$ family of distributions, have been employed to SAR data and many of the current advancements in processing this imagery have been accomplished through extracting information from these models. In this paper, we propose improvements to parameter estimation in $\mathcal{G}^0$ distributions using the Method of Log-Cumulants. First, using Bayesian modeling, we construct that regularly produce reliable roughness estimates under both $\mathcal{G}^0_A$ and $\mathcal{G}^0_I$ models. Second, we make use of an approximation of the Trigamma function to compute the estimated roughness in constant time, making it considerably faster than the existing method for this task. Finally, we show how we can use this method to achieve fast and reliable SAR image understanding based on roughness information. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2303.16859 [pdf, other]

doi 10.1088/2632-072X/acf6a4

Group polarization, influence, and domination in online interaction networks: A case study of the 2022 Brazilian elections

Authors: Ruben Interian, Francisco A. Rodrigues

Abstract: In this work, we investigate the evolution of polarization, influence, and domination in online interaction networks. Twitter data collected before and during the 2022 Brazilian elections is used as a case study. From a theoretical perspective, we develop a methodology called d-modularity that allows discovering the contribution of specific groups to network polarization using the well-known modul… ▽ More In this work, we investigate the evolution of polarization, influence, and domination in online interaction networks. Twitter data collected before and during the 2022 Brazilian elections is used as a case study. From a theoretical perspective, we develop a methodology called d-modularity that allows discovering the contribution of specific groups to network polarization using the well-known modularity measure. While the overall network modularity (somewhat unexpectedly) decreased, the proposed group-oriented approach allows concluding that the contribution of the right-leaning community to this modularity increased, remaining very high during the analyzed period. Our methodology is general enough to be used in any situation when the contribution of specific groups to overall network modularity and polarization is needed to investigate. Moreover, using the concept of partial domination, we are able to compare the reach of sets of influential profiles from different groups and their ability to accomplish coordinated communication inside their groups and across segments of the entire network during some specific time window. We show that in the whole network, the left-leaning high-influential information spreaders dominated, reaching a substantial fraction of users with fewer spreaders. However, when comparing domination inside the groups, the results are inverse. Right-leaning spreaders dominate their communities using few nodes, showing as the most capable of accomplishing coordinated communication. The results bring evidence of extreme isolation and the ease of accomplishing coordinated communication that characterized right-leaning communities during the 2022 Brazilian elections. △ Less

Submitted 29 March, 2023; originally announced March 2023.

MSC Class: 05C69; 05C90 ACM Class: J.4; G.2.2

arXiv:2204.05059 [pdf, other]

doi 10.1016/j.chaos.2022.112306

Forecasting new diseases in low-data settings using transfer learning

Authors: Kirstin Roster, Colm Connaughton, Francisco A. Rodrigues

Abstract: Recent infectious disease outbreaks, such as the COVID-19 pandemic and the Zika epidemic in Brazil, have demonstrated both the importance and difficulty of accurately forecasting novel infectious diseases. When new diseases first emerge, we have little knowledge of the transmission process, the level and duration of immunity to reinfection, or other parameters required to build realistic epidemiol… ▽ More Recent infectious disease outbreaks, such as the COVID-19 pandemic and the Zika epidemic in Brazil, have demonstrated both the importance and difficulty of accurately forecasting novel infectious diseases. When new diseases first emerge, we have little knowledge of the transmission process, the level and duration of immunity to reinfection, or other parameters required to build realistic epidemiological models. Time series forecasts and machine learning, while less reliant on assumptions about the disease, require large amounts of data that are also not available in early stages of an outbreak. In this study, we examine how knowledge of related diseases can help make predictions of new diseases in data-scarce environments using transfer learning. We implement both an empirical and a theoretical approach. Using empirical data from Brazil, we compare how well different machine learning models transfer knowledge between two different disease pairs: (i) dengue and Zika, and (ii) influenza and COVID-19. In the theoretical analysis, we generate data using different transmission and recovery rates with an SIR compartmental model, and then compare the effectiveness of different transfer learning methods. We find that transfer learning offers the potential to improve predictions, even beyond a model based on data from the target disease, though the appropriate source disease must be chosen carefully. While imperfect, these models offer an additional input for decision makers during pandemic response. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2110.06140 [pdf, other]

EEG functional connectivity and deep learning for automatic diagnosis of brain disorders: Alzheimer's disease and schizophrenia

Authors: Caroline L. Alves, Aruane M. Pineda, Kirstin Roster, Christiane Thielemann, Francisco A. Rodrigues

Abstract: Mental disorders are among the leading causes of disability worldwide. The first step in treating these conditions is to obtain an accurate diagnosis, but the absence of established clinical tests makes this task challenging. Machine learning algorithms can provide a possible solution to this problem, as we describe in this work. We present a method for the automatic diagnosis of mental disorders… ▽ More Mental disorders are among the leading causes of disability worldwide. The first step in treating these conditions is to obtain an accurate diagnosis, but the absence of established clinical tests makes this task challenging. Machine learning algorithms can provide a possible solution to this problem, as we describe in this work. We present a method for the automatic diagnosis of mental disorders based on the matrix of connections obtained from EEG time series and deep learning. We show that our approach can classify patients with Alzheimer's disease and schizophrenia with a high level of accuracy. The comparison with the traditional cases, that use raw EEG time series, shows that our method provides the highest precision. Therefore, the application of deep neural networks on data from brain connections is a very promising method to the diagnosis of neurological disorders. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 10 pages, 5 figures, 9 tables

arXiv:2106.12905 [pdf, other]

Neural Networks for Dengue Prediction: A Systematic Review

Authors: Kirstin Roster, Francisco A. Rodrigues

Abstract: Due to a lack of treatments and universal vaccine, early forecasts of Dengue are an important tool for disease control. Neural networks are powerful predictive models that have made contributions to many areas of public health. In this systematic review, we provide an introduction to the neural networks relevant to Dengue forecasting and review their applications in the literature. The objective i… ▽ More Due to a lack of treatments and universal vaccine, early forecasts of Dengue are an important tool for disease control. Neural networks are powerful predictive models that have made contributions to many areas of public health. In this systematic review, we provide an introduction to the neural networks relevant to Dengue forecasting and review their applications in the literature. The objective is to help inform model design for future work. Following the PRISMA guidelines, we conduct a systematic search of studies that use neural networks to forecast Dengue in human populations. We summarize the relative performance of neural networks and comparator models, model architectures and hyper-parameters, as well as choices of input features. Nineteen papers were included. Most studies implement shallow neural networks using historical Dengue incidence and meteorological input features. Prediction horizons tend to be short. Building on the strengths of neural networks, most studies use granular observations at the city or sub-national level. Performance of neural networks relative to comparators such as Support Vector Machines varies across study contexts. The studies suggest that neural networks can provide good predictions of Dengue and should be included in the set of candidate models. The use of convolutional, recurrent, or deep networks is relatively unexplored but offers promising avenues for further research, as does the use of a broader set of input features such as social media or mobile phone data. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: 16 pages, 6 figures, 1 table

arXiv:1910.00544 [pdf, other]

A machine learning approach to predicting dynamical observables from network structure

Authors: Francisco A. Rodrigues, Thomas Peron, Colm Connaughton, Jurgen Kurths, Yamir Moreno

Abstract: Estimating the outcome of a given dynamical process from structural features is a key unsolved challenge in network science. The goal is hindered by difficulties associated to nonlinearities, correlations and feedbacks between the structure and dynamics of complex systems. In this work, we develop an approach based on machine learning algorithms that is shown to provide an answer to the previous c… ▽ More Estimating the outcome of a given dynamical process from structural features is a key unsolved challenge in network science. The goal is hindered by difficulties associated to nonlinearities, correlations and feedbacks between the structure and dynamics of complex systems. In this work, we develop an approach based on machine learning algorithms that is shown to provide an answer to the previous challenge. Specifically, we show that it is possible to estimate the outbreak size of a disease starting from a single node as well as the degree of synchronicity of a system made up of Kuramoto oscillators. In doing so, we show which topological features of the network are key for this estimation, and provide a rank of the importance of network metrics with higher accuracy than previously done. Our approach is general and can be applied to any dynamical process running on top of complex networks. Likewise, our work constitutes an important step towards the application of machine learning methods to unravel dynamical patterns emerging in complex networked systems. △ Less

Submitted 1 October, 2019; originally announced October 2019.

Comments: 5 pages including 6 figures

arXiv:1902.00716 [pdf, other]

doi 10.1088/1367-2630/ab687c

Centrality anomalies in complex networks as a result of model over-simplification

Authors: Luiz G. A. Alves, Alberto Aleta, Francisco A. Rodrigues, Yamir Moreno, Luis A. Nunes Amaral

Abstract: Tremendous advances have been made in our understanding of the properties and evolution of complex networks. These advances were initially driven by information-poor empirical networks and theoretical analysis of unweighted and undirected graphs. Recently, information-rich empirical data complex networks supported the development of more sophisticated models that include edge directionality and we… ▽ More Tremendous advances have been made in our understanding of the properties and evolution of complex networks. These advances were initially driven by information-poor empirical networks and theoretical analysis of unweighted and undirected graphs. Recently, information-rich empirical data complex networks supported the development of more sophisticated models that include edge directionality and weight properties, and multiple layers. Many studies still focus on unweighted undirected description of networks, prompting an essential question: how to identify when a model is simpler than it must be? Here, we argue that the presence of centrality anomalies in complex networks is a result of model over-simplification. Specifically, we investigate the well-known anomaly in betweenness centrality for transportation networks, according to which highly connected nodes are not necessarily the most central. Using a broad class of network models with weights and spatial constraints and four large data sets of transportation networks, we show that the unweighted projection of the structure of these networks can exhibit a significant fraction of anomalous nodes compared to a random null model. However, the weighted projection of these networks, compared with an appropriated null model, significantly reduces the fraction of anomalies observed, suggesting that centrality anomalies are a symptom of model over-simplification. Because lack of information-rich data is a common challenge when dealing with complex networks and can cause anomalies that misestimate the role of nodes in the system, we argue that sufficiently sophisticated models be used when anomalies are detected. △ Less

Submitted 13 March, 2020; v1 submitted 2 February, 2019; originally announced February 2019.

Comments: 14 pages, including 9 figures. APS style. Accepted for publication in New Journal of Physics

Journal ref: New Journal of Physics 23, 013043 (2020)

arXiv:1808.02931 [pdf, other]

doi 10.1103/PhysRevE.99.032301

Mobility helps problem-solving systems to avoid Groupthink

Authors: Paulo F. Gomes, Sandro M. Reia, Francisco A. Rodrigues, José F. Fontanari

Abstract: Groupthink occurs when everyone in a group starts thinking alike, as when people put unlimited faith in a leader. Avoiding this phenomenon is a ubiquitous challenge to problem-solving enterprises and typical countermeasures involve the mobility of group members. Here we use an agent-based model of imitative learning to study the influence of the mobility of the agents on the time they require to f… ▽ More Groupthink occurs when everyone in a group starts thinking alike, as when people put unlimited faith in a leader. Avoiding this phenomenon is a ubiquitous challenge to problem-solving enterprises and typical countermeasures involve the mobility of group members. Here we use an agent-based model of imitative learning to study the influence of the mobility of the agents on the time they require to find the global maxima of NK-fitness landscapes. The agents cooperate by exchanging information on their fitness and use this information to copy the fittest agent in their influence neighborhoods, which are determined by face-to-face interaction networks. The influence neighborhoods are variable since the agents perform random walks in a two-dimensional space. We find that mobility is slightly harmful for solving easy problems, i.e. problems that do not exhibit suboptimal solutions or local maxima. For difficult problems, however, mobility can prevent the imitative search being trapped in suboptimal solutions and guarantees a better performance than the independent search for any system size. △ Less

Submitted 7 January, 2019; v1 submitted 7 August, 2018; originally announced August 2018.

Journal ref: Phys. Rev. E 99, 032301 (2019)

arXiv:1808.02848 [pdf, other]

Pattern Recognition Approach to Violin Shapes of MIMO database

Authors: Thomas Peron, Francisco A. Rodrigues, Luciano da F. Costa

Abstract: Since the landmarks established by the Cremonese school in the 16th century, the history of violin design has been marked by experimentation. While great effort has been invested since the early 19th century by the scientific community on researching violin acoustics, substantially less attention has been given to the statistical characterization of how the violin shape evolved over time. In this… ▽ More Since the landmarks established by the Cremonese school in the 16th century, the history of violin design has been marked by experimentation. While great effort has been invested since the early 19th century by the scientific community on researching violin acoustics, substantially less attention has been given to the statistical characterization of how the violin shape evolved over time. In this paper we study the morphology of violins retrieved from the Musical Instrument Museums Online (MIMO) database -- the largest freely accessible platform providing information about instruments held in public museums. From the violin images, we derive a set of measurements that reflect relevant geometrical features of the instruments. The application of Principal Component Analysis (PCA) uncovered similarities between violin makers and their respective copyists, as well as among luthiers belonging to the same family lineage, in the context of historical narrative. Combined with a time-windowed approach, thin plate splines visualizations revealed that the average violin outline has remained mostly stable over time, not adhering to any particular trends of design across different periods in music history. △ Less

Submitted 8 August, 2018; originally announced August 2018.

arXiv:1706.07972 [pdf, other]

doi 10.1145/3110025.3110039

The Impact of Social Curiosity on Information Spreading on Networks

Authors: Didier A. Vega-Oliveros, Lilian Berton, Federico Vazquez, Francisco A. Rodrigues

Abstract: Most information spreading models consider that all individuals are identical psychologically. They ignore, for instance, the curiosity level of people, which may indicate that they can be influenced to seek for information given their interest. For example, the game Pokémon GO spread rapidly because of the aroused curiosity among users. This paper proposes an information propagation model conside… ▽ More Most information spreading models consider that all individuals are identical psychologically. They ignore, for instance, the curiosity level of people, which may indicate that they can be influenced to seek for information given their interest. For example, the game Pokémon GO spread rapidly because of the aroused curiosity among users. This paper proposes an information propagation model considering the curiosity level of each individual, which is a dynamical parameter that evolves over time. We evaluate the efficiency of our model in contrast to traditional information propagation models, like SIR or IC, and perform analysis on different types of artificial and real-world networks, like Google+, Facebook, and the United States roads map. We present a mean-field approach that reproduces with a good accuracy the evolution of macroscopic quantities, such as the density of stiflers, for the system's behavior with the curiosity. We also obtain an analytical solution of the mean-field equations that allows to predicts a transition from a phase where the information remains confined to a small number of users to a phase where it spreads over a large fraction of the population. The results indicate that the curiosity increases the information spreading in all networks as compared with the spreading without curiosity, and that this increase is larger in spatial networks than in social networks. When the curiosity is taken into account, the maximum number of informed individuals is reached close to the transition point. Since curious people are more open to a new product, concepts, and ideas, this is an important factor to be considered in propagation modeling. Our results contribute to the understanding of the interplay between diffusion process and dynamical heterogeneous transmission in social networks. △ Less

Submitted 24 June, 2017; originally announced June 2017.

Comments: 8 pages, 5 figures

arXiv:1705.00630 [pdf, other]

Influence maximization by rumor spreading on correlated networks through community identification

Authors: Didier A. Vega-Oliveros, Luciano da Fontoura Costa, Francisco Aparecido Rodrigues

Abstract: The identification of the minimal set of nodes that maximizes the propagation of information is one of the most relevant problems in network science. In this paper, we introduce a new method to find the set of initial spreaders to maximize the information propagation in complex networks. We evaluate this method in assortative networks and verify that degree-degree correlation plays a fundamental r… ▽ More The identification of the minimal set of nodes that maximizes the propagation of information is one of the most relevant problems in network science. In this paper, we introduce a new method to find the set of initial spreaders to maximize the information propagation in complex networks. We evaluate this method in assortative networks and verify that degree-degree correlation plays a fundamental role in the spreading dynamics. Simulation results show that our algorithm is statistically similar, regarding the average size of outbreaks, to the greedy approach in real-world networks. However, our method is much less time consuming than the greedy algorithm. △ Less

Submitted 8 November, 2019; v1 submitted 1 May, 2017; originally announced May 2017.

Journal ref: Communications in Nonlinear Science and Numerical Simulation, 105094 (2019)

arXiv:1612.08388 [pdf, other]

Clustering Algorithms: A Comparative Approach

Authors: Mayra Z. Rodriguez, Cesar H. Comin, Dalcimar Casanova, Odemir M. Bruno, Diego R. Amancio, Francisco A. Rodrigues, Luciano da F. Costa

Abstract: Many real-world systems can be studied in terms of pattern recognition tasks, so that proper use (and understanding) of machine learning methods in practical applications becomes essential. While a myriad of classification methods have been proposed, there is no consensus on which methods are more suitable for a given dataset. As a consequence, it is important to comprehensively compare methods in… ▽ More Many real-world systems can be studied in terms of pattern recognition tasks, so that proper use (and understanding) of machine learning methods in practical applications becomes essential. While a myriad of classification methods have been proposed, there is no consensus on which methods are more suitable for a given dataset. As a consequence, it is important to comprehensively compare methods in many possible scenarios. In this context, we performed a systematic comparison of 7 well-known clustering methods available in the R language. In order to account for the many possible variations of data, we considered artificial datasets with several tunable properties (number of classes, separation between classes, etc). In addition, we also evaluated the sensitivity of the clustering methods with regard to their parameters configuration. The results revealed that, when considering the default configurations of the adopted methods, the spectral approach usually outperformed the other clustering algorithms. We also found that the default configuration of the adopted implementations was not accurate. In these cases, a simple approach based on random selection of parameters values proved to be a good alternative to improve the performance. All in all, the reported approach provides subsidies guiding the choice of clustering algorithms. △ Less

Submitted 26 December, 2016; originally announced December 2016.

arXiv:1612.03705 [pdf, other]

Segmentation of large images based on super-pixels and community detection in graphs

Authors: Oscar A. C. Linares, Glenda Michele Botelho, Francisco Aparecido Rodrigues, João Batista Neto

Abstract: Image segmentation has many applications which range from machine learning to medical diagnosis. In this paper, we propose a framework for the segmentation of images based on super-pixels and algorithms for community identification in graphs. The super-pixel pre-segmentation step reduces the number of nodes in the graph, rendering the method the ability to process large images. Moreover, community… ▽ More Image segmentation has many applications which range from machine learning to medical diagnosis. In this paper, we propose a framework for the segmentation of images based on super-pixels and algorithms for community identification in graphs. The super-pixel pre-segmentation step reduces the number of nodes in the graph, rendering the method the ability to process large images. Moreover, community detection algorithms provide more accurate segmentation than traditional approaches, such as those based on spectral graph partition. We also compare our method with two algorithms: a) the graph-based approach by Felzenszwalb and Huttenlocher and b) the contour-based method by Arbelaez. Results have shown that our method provides more precise segmentation and is faster than both of them. △ Less

Submitted 12 December, 2016; originally announced December 2016.

Comments: 20 pages, 12 figures

arXiv:1609.00682 [pdf, other]

Unifying Markov Chain Approach for Disease and Rumor Spreading in Complex Networks

Authors: Guilherme Ferraz de Arruda, Francisco A. Rodrigues, Pablo Martin Rodriiguez, Emanuele Cozzo, Yamir Moreno

Abstract: Spreading processes are ubiquitous in natural and artificial systems. They can be studied via a plethora of models, depending on the specific details of the phenomena under study. Disease contagion and rumor spreading are among the most important of these processes due to their practical relevance. However, despite the similarities between them, current models address both spreading dynamics separ… ▽ More Spreading processes are ubiquitous in natural and artificial systems. They can be studied via a plethora of models, depending on the specific details of the phenomena under study. Disease contagion and rumor spreading are among the most important of these processes due to their practical relevance. However, despite the similarities between them, current models address both spreading dynamics separately. In this paper, we propose a general information spreading model that is based on discrete time Markov chains. The model includes all the transitions that are plausible for both a disease contagion process and rumor propagation. We show that our model not only covers the traditional spreading schemes, but that it also contains some features relevant in social dynamics, such as apathy, forgetting, and lost/recovering of interest. The model is evaluated analytically to obtain the spreading thresholds and the early time dynamical behavior for the contact and reactive processes in several scenarios. Comparison with Monte Carlo simulations shows that the Markov chain formalism is highly accurate while it excels in computational efficiency. We round off our work by showing how the proposed framework can be applied to the study of spreading processes occurring on social networks. △ Less

Submitted 4 September, 2016; v1 submitted 2 September, 2016; originally announced September 2016.

Comments: 19 pages and 13 figures. APS format. Submitted for publication

arXiv:1512.01418 [pdf, ps, other]

doi 10.1103/PhysRevE.92.032810

Thermodynamic characterization of networks using graph polynomials

Authors: Cheng Ye, Cesar H. Comin, Thomas K. DM. Peron, Filipi N. Silva, Francisco A. Rodrigues, Luciano da F. Costa, Andrea Torsello, Edwin R. Hancock

Abstract: In this paper, we present a method for characterizing the evolution of time-varying complex networks by adopting a thermodynamic representation of network structure computed from a polynomial (or algebraic) characterization of graph structure. Commencing from a representation of graph structure based on a characteristic polynomial computed from the normalized Laplacian matrix, we show how the poly… ▽ More In this paper, we present a method for characterizing the evolution of time-varying complex networks by adopting a thermodynamic representation of network structure computed from a polynomial (or algebraic) characterization of graph structure. Commencing from a representation of graph structure based on a characteristic polynomial computed from the normalized Laplacian matrix, we show how the polynomial is linked to the Boltzmann partition function of a network. This allows us to compute a number of thermodynamic quantities for the network, including the average energy and entropy. Assuming that the system does not change volume, we can also compute the temperature, defined as the rate of change of entropy with energy. All three thermodynamic variables can be approximated using low-order Taylor series that can be computed using the traces of powers of the Laplacian matrix, avoiding explicit computation of the normalized Laplacian spectrum. These polynomial approximations allow a smoothed representation of the evolution of networks to be constructed in the thermodynamic space spanned by entropy, energy, and temperature. We show how these thermodynamic variables can be computed in terms of simple network characteristics, e.g., the total number of nodes and node degree statistics for nodes connected by edges. We apply the resulting thermodynamic characterization to real-world time-varying networks representing complex systems in the financial and biological domains. The study demonstrates that the method provides an efficient tool for detecting abrupt changes and characterizing different stages in network evolution. △ Less

Submitted 12 October, 2015; originally announced December 2015.

Comments: 16 pages, 12 figures. Published 25 September 2015

arXiv:1510.03059 [pdf, other]

doi 10.1007/s12064-015-0219-1

Influence of network topology on cooperative problem-solving systems

Authors: José F. Fontanari, Francisco A. Rodrigues

Abstract: The idea of a collective intelligence behind the complex natural structures built by organisms suggests that the organization of social networks is selected so as to optimize problem-solving competence at the group-level. Here we study the influence of the social network topology on the performance of a group of agents whose task is to locate the global maxima of NK fitness landscapes. Agents coop… ▽ More The idea of a collective intelligence behind the complex natural structures built by organisms suggests that the organization of social networks is selected so as to optimize problem-solving competence at the group-level. Here we study the influence of the social network topology on the performance of a group of agents whose task is to locate the global maxima of NK fitness landscapes. Agents cooperate by broadcasting messages informing on their fitness and use this information to imitate the fittest agent in their influence networks. In the case those messages convey accurate information on the proximity of the solution (i.e., for smooth fitness landscapes) we find that high connectivity as well as centralization boost the group performance. For rugged landscapes, however, these characteristics are beneficial for small groups only. For large groups, it is advantageous to slow down the information transmission through the network to avoid local maximum traps. Long-range links and modularity have marginal effects on the performance of the group, except for a very narrow region of the model parameters. △ Less

Submitted 9 November, 2015; v1 submitted 11 October, 2015; originally announced October 2015.

Journal ref: Theory in Biosciences 135 (2016) 101-110

arXiv:1507.04550 [pdf, other]

On degree-degree correlations in multilayer networks

Authors: Guilherme Ferraz de Arruda, Emanuele Cozzo, Yamir Moreno, Francisco A. Rodrigues

Abstract: We propose a generalization of the concept of assortativity based on the tensorial representation of multilayer networks, covering the definitions given in terms of Pearson and Spearman coefficients. Our approach can also be applied to weighted networks and provides information about correlations considering pairs of layers. By analyzing the multilayer representation of the airport transportation… ▽ More We propose a generalization of the concept of assortativity based on the tensorial representation of multilayer networks, covering the definitions given in terms of Pearson and Spearman coefficients. Our approach can also be applied to weighted networks and provides information about correlations considering pairs of layers. By analyzing the multilayer representation of the airport transportation network, we show that contrasting results are obtained when the layers are analyzed independently or as an interconnected system. Finally, we study the impact of the level of assortativity and heterogeneity between layers on the spreading of diseases. Our results highlight the need of studying degree-degree correlations on multilayer systems, instead of on aggregated networks. △ Less

Submitted 16 July, 2015; originally announced July 2015.

Comments: 8 pages, 3 figures

arXiv:1504.05567 [pdf, other]

Multilayer networks: metrics and spectral properties

Authors: Emanuele Cozzo, Guilherme Ferraz de Arruda, Francisco A. Rodrigues, Yamir Moreno

Abstract: Multilayer networks represent systems in which there are several topological levels each one representing one kind of interaction or interdependency between the systems' elements. These networks have attracted a lot of attention recently because their study allows considering different dynamical modes concurrently. Here, we revise the main concepts and tools developed up to date. Specifically, we… ▽ More Multilayer networks represent systems in which there are several topological levels each one representing one kind of interaction or interdependency between the systems' elements. These networks have attracted a lot of attention recently because their study allows considering different dynamical modes concurrently. Here, we revise the main concepts and tools developed up to date. Specifically, we focus on several metrics for multilayer network characterization as well as on the spectral properties of the system, which ultimately enable for the dynamical characterization of several critical phenomena. The theoretical framework is also applied for description of real-world multilayer systems. △ Less

Submitted 21 April, 2015; originally announced April 2015.

Comments: Chapter contribution to the book "Interconnected networks", edited by F. Schweitzer and A. Garas

arXiv:1407.0224 [pdf, other]

doi 10.1016/j.ins.2015.11.014

Concentric Network Symmetry

Authors: Filipi N. Silva, Cesar H. Comin, Thomas K. DM. Peron, Francisco A. Rodrigues, Cheng Ye, Richard C. Wilson, Edwin Hancock, Luciano da F. Costa

Abstract: Quantification of symmetries in complex networks is typically done globally in terms of automorphisms. Extending previous methods to locally assess the symmetry of nodes is not straightforward. Here we present a new framework to quantify the symmetries around nodes, which we call connectivity patterns. We develop two topological transformations that allow a concise characterization of the differen… ▽ More Quantification of symmetries in complex networks is typically done globally in terms of automorphisms. Extending previous methods to locally assess the symmetry of nodes is not straightforward. Here we present a new framework to quantify the symmetries around nodes, which we call connectivity patterns. We develop two topological transformations that allow a concise characterization of the different types of symmetry appearing on networks and apply these concepts to six network models, namely the Erdős-Rényi, Barabási-Albert, random geometric graph, Waxman, Voronoi and rewired Voronoi. Real-world networks, namely the scientific areas of Wikipedia, the world-wide airport network and the street networks of Oldenburg and San Joaquin, are also analyzed in terms of the proposed symmetry measurements. Several interesting results emerge from this analysis, including the high symmetry exhibited by the Erdős-Rényi model. Additionally, we found that the proposed measurements present low correlation with other traditional metrics, such as node degree and betweenness centrality. Principal component analysis is used to combine all the results, revealing that the concepts presented here have substantial potential to also characterize networks at a global scale. △ Less

Submitted 2 October, 2014; v1 submitted 1 July, 2014; originally announced July 2014.

arXiv:1404.4528 [pdf, other]

The role of centrality for the identification of influential spreaders in complex networks

Authors: Guilherme Ferraz de Arruda, André Luiz Barbieri, Pablo Martín Rodriguez, Yamir Moreno, Luciano da Fontoura Costa, Francisco Aparecido Rodrigues

Abstract: The identification of the most influential spreaders in networks is important to control and understand the spreading capabilities of the system as well as to ensure an efficient information diffusion such as in rumor-like dynamics. Recent works have suggested that the identification of influential spreaders is not independent of the dynamics being studied. For instance, the key disease spreaders… ▽ More The identification of the most influential spreaders in networks is important to control and understand the spreading capabilities of the system as well as to ensure an efficient information diffusion such as in rumor-like dynamics. Recent works have suggested that the identification of influential spreaders is not independent of the dynamics being studied. For instance, the key disease spreaders might not necessarily be so when it comes to analyze social contagion or rumor propagation. Additionally, it has been shown that different metrics (degree, coreness, etc) might identify different influential nodes even for the same dynamical processes with diverse degree of accuracy. In this paper, we investigate how nine centrality measures correlate with the disease and rumor spreading capabilities of the nodes that made up different synthetic and real-world (both spatial and non-spatial) networks. We also propose a generalization of the random walk accessibility as a new centrality measure and derive analytical expressions for the latter measure for simple network configurations. Our results show that for non-spatial networks, the $k$-core and degree centralities are most correlated to epidemic spreading, whereas the average neighborhood degree, the closeness centrality and accessibility are most related to rumor dynamics. On the contrary, for spatial networks, the accessibility measure outperforms the rest of centrality metrics in almost all cases regardless of the kind of dynamics considered. Therefore, an important consequence of our analysis is that previous studies performed in synthetic random networks cannot be generalized to the case of spatial networks. △ Less

Submitted 17 April, 2014; originally announced April 2014.

Comments: 17 pages, 11 figures, 3 tables

arXiv:1311.0202 [pdf, other]

doi 10.1371/journal.pone.0094137

A systematic comparison of supervised classifiers

Authors: D. R. Amancio, C. H. Comin, D. Casanova, G. Travieso, O. M. Bruno, F. A. Rodrigues, L. da F. Costa

Abstract: Pattern recognition techniques have been employed in a myriad of industrial, medical, commercial and academic applications. To tackle such a diversity of data, many techniques have been devised. However, despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, the consideration of as many as possible techniqu… ▽ More Pattern recognition techniques have been employed in a myriad of industrial, medical, commercial and academic applications. To tackle such a diversity of data, many techniques have been devised. However, despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, the consideration of as many as possible techniques presents itself as an fundamental practice in applications aiming at high accuracy. Typical works comparing methods either emphasize the performance of a given algorithm in validation tests or systematically compare various algorithms, assuming that the practical use of these methods is done by experts. In many occasions, however, researchers have to deal with their practical classification tasks without an in-depth knowledge about the underlying mechanisms behind parameters. Actually, the adequate choice of classifiers and parameters alike in such practical circumstances constitutes a long-standing problem and is the subject of the current paper. We carried out a study on the performance of nine well-known classifiers implemented by the Weka framework and compared the dependence of the accuracy with their configuration parameter configurations. The analysis of performance with default parameters revealed that the k-nearest neighbors method exceeds by a large margin the other methods when high dimensional datasets are considered. When other configuration of parameters were allowed, we found that it is possible to improve the quality of SVM in more than 20% even if parameters are set randomly. Taken together, the investigation conducted in this paper suggests that, apart from the SVM implementation, Weka's default configuration of parameters provides an performance close the one achieved with the optimal configuration. △ Less

Submitted 16 October, 2013; originally announced November 2013.

Journal ref: PLoS ONE 9 (4): e94137, 2014

arXiv:1310.3389 [pdf, other]

doi 10.1209/0295-5075/121/68001

Spectra of random networks in the weak clustering regime

Authors: Thomas K. DM. Peron, Peng Ji, Jürgen Kurths, Francisco A. Rodrigues

Abstract: The asymptotic behaviour of dynamical processes in networks can be expressed as a function of spectral properties of the corresponding adjacency and Laplacian matrices. Although many theoretical results are known for the spectra of traditional configuration models, networks generated through these models fail to describe many topological features of real-world networks, in particular non-null valu… ▽ More The asymptotic behaviour of dynamical processes in networks can be expressed as a function of spectral properties of the corresponding adjacency and Laplacian matrices. Although many theoretical results are known for the spectra of traditional configuration models, networks generated through these models fail to describe many topological features of real-world networks, in particular non-null values of the clustering coefficient. Here we study effects of cycles of order three (triangles) in network spectra. By using recent advances in random matrix theory, we determine the spectral distribution of the network adjacency matrix as a function of the average number of triangles attached to each node for networks without modular structure and degree-degree correlations. Implications to network dynamics are discussed. Our findings can shed light in the study of how particular kinds of subgraphs influence network dynamics. △ Less

Submitted 27 May, 2018; v1 submitted 12 October, 2013; originally announced October 2013.

Journal ref: Europhys. Lett. 121, 68001 (2018)

arXiv:1203.4807 [pdf, other]

doi 10.1016/j.joi.2013.01.007

Quantifying the interdisciplinarity of scientific journals and fields

Authors: Filipi Nascimento Silva, Francisco Aparecido Rodrigues, Osvaldo Novais de Oliveira Junior, Luciano da Fontoura Costa

Abstract: There is an overall perception of increased interdisciplinarity in science, but this is difficult to confirm quantitatively owing to the lack of adequate methods to evaluate subjective phenomena. This is no different from the difficulties in establishing quantitative relationships in human and social sciences. In this paper we quantified the interdisciplinarity of scientific journals and science f… ▽ More There is an overall perception of increased interdisciplinarity in science, but this is difficult to confirm quantitatively owing to the lack of adequate methods to evaluate subjective phenomena. This is no different from the difficulties in establishing quantitative relationships in human and social sciences. In this paper we quantified the interdisciplinarity of scientific journals and science fields by using an entropy measurement based on the diversity of the subject categories of journals citing a specific journal. The methodology consisted in building citation networks using the Journal Citation Reports database, in which the nodes were journals and edges were established based on citations among journals. The overall network for the 11-year period (1999-2009) studied was small-world and scale free with regard to the in-strength. Upon visualizing the network topology an overall structure of the various science fields could be inferred, especially their interconnections. We confirmed quantitatively that science fields are becoming increasingly interdisciplinary, with the degree of interdisplinarity (i.e. entropy) correlating strongly with the in-strength of journals and with the impact factor. △ Less

Submitted 21 March, 2012; originally announced March 2012.

Comments: 23 pages, 6 figures

Journal ref: Journal of Informetrics. Volume 7, Issue 2, Pages 469-477, 2003

arXiv:1102.0099 [pdf, other]

doi 10.1371/journal.pone.0015765

Automatic Network Fingerprinting through Single-Node Motifs

Authors: Christoph Echtermeyer, Luciano da Fontoura Costa, Francisco A. Rodrigues, Marcus Kaiser

Abstract: Complex networks have been characterised by their specific connectivity patterns (network motifs), but their building blocks can also be identified and described by node-motifs---a combination of local network features. One technique to identify single node-motifs has been presented by Costa et al. (L. D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett., 87, 1, 2009). Here… ▽ More Complex networks have been characterised by their specific connectivity patterns (network motifs), but their building blocks can also be identified and described by node-motifs---a combination of local network features. One technique to identify single node-motifs has been presented by Costa et al. (L. D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett., 87, 1, 2009). Here, we first suggest improvements to the method including how its parameters can be determined automatically. Such automatic routines make high-throughput studies of many networks feasible. Second, the new routines are validated in different network-series. Third, we provide an example of how the method can be used to analyse network time-series. In conclusion, we provide a robust method for systematically discovering and classifying characteristic nodes of a network. In contrast to classical motif analysis, our approach can identify individual components (here: nodes) that are specific to a network. Such special nodes, as hubs before, might be found to play critical roles in real-world networks. △ Less

Submitted 1 February, 2011; originally announced February 2011.

Comments: 16 pages (4 figures) plus supporting information 8 pages (5 figures)

Journal ref: Echtermeyer C, da Fontoura Costa L, Rodrigues FA, Kaiser M (2011) Automatic Network Fingerprinting through Single-Node Motifs. PLoS ONE 6(1): e15765

arXiv:1101.5141 [pdf, ps, other]

A Complex Networks Approach for Data Clustering

Authors: Francisco A. Rodrigues, Guilherme Ferraz de Arruda, Luciano da Fontoura Costa

Abstract: Many methods have been developed for data clustering, such as k-means, expectation maximization and algorithms based on graph theory. In this latter case, graphs are generally constructed by taking into account the Euclidian distance as a similarity measure, and partitioned using spectral methods. However, these methods are not accurate when the clusters are not well separated. In addition, it is… ▽ More Many methods have been developed for data clustering, such as k-means, expectation maximization and algorithms based on graph theory. In this latter case, graphs are generally constructed by taking into account the Euclidian distance as a similarity measure, and partitioned using spectral methods. However, these methods are not accurate when the clusters are not well separated. In addition, it is not possible to automatically determine the number of clusters. These limitations can be overcome by taking into account network community identification algorithms. In this work, we propose a methodology for data clustering based on complex networks theory. We compare different metrics for quantifying the similarity between objects and take into account three community finding techniques. This approach is applied to two real-world databases and to two sets of artificially generated data. By comparing our method with traditional clustering approaches, we verify that the proximity measures given by the Chebyshev and Manhattan distances are the most suitable metrics to quantify the similarity between objects. In addition, the community identification method based on the greedy optimization provides the smallest misclassification rates. △ Less

Submitted 26 January, 2011; originally announced January 2011.

Comments: 9 pages, 8 Figures

Showing 1–29 of 29 results for author: Rodrigues, F A