-
A Systematic Approach for Studying How Topological Measurements Respond to Complex Networks Modifications
Authors:
Alexandre Benatti,
Roberto M. Cesar Jr.,
Luciano da F. Costa
Abstract:
Different types of graphs and complex networks have been characterized, analyzed, and modeled based on measurements of their respective topology. However, the available networks may constitute approximations of the original structure as a consequence of sampling incompleteness, noise, and/or error in the representation of that structure. Therefore, it becomes of particular interest to quantify how…
▽ More
Different types of graphs and complex networks have been characterized, analyzed, and modeled based on measurements of their respective topology. However, the available networks may constitute approximations of the original structure as a consequence of sampling incompleteness, noise, and/or error in the representation of that structure. Therefore, it becomes of particular interest to quantify how successive modifications may impact a set of adopted topological measurements, and how respectively undergone changes can be interrelated, which has been addressed in this paper by considering similarity networks and hierarchical clustering approaches. These studies are developed respectively to several topological measurements (accessibility, degree, hierarchical degree, clustering coefficient, betweenness centrality, assortativity, and average shortest path) calculated from complex networks of three main types (Erdős-Rényi, Barabási-Albert, and geographical) with varying sizes or subjected to progressive edge removal or rewiring. The coincidence similarity index, which can implement particularly strict comparisons, is adopted for two main purposes: to quantify and visualize how the considered topological measurements respond to the considered network alterations and to represent hierarchically the relationships between the observed changes undergone by the considered topological measurements. Several results are reported and discussed, including the identification of three types of topological changes taking place as a consequence of the modifications. In addition, the changes observed for the Erdős-Rényi and Barabási-Albert networks resulted mutually more similarly affected by topological changes than for the geometrical networks. The latter type of network has been identified to have more heterogeneous topological features than the other two types of networks.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Normalization in Proportional Feature Spaces
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
The subject of features normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling, as it can substantially influence and be influenced by all of these activities and respective aspects. The selection of an appropriate normalization method needs to take into account the type and characteristics of the in…
▽ More
The subject of features normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling, as it can substantially influence and be influenced by all of these activities and respective aspects. The selection of an appropriate normalization method needs to take into account the type and characteristics of the involved features, the methods to be used subsequently for the just mentioned data processing, as well as the specific questions being considered. After briefly considering how normalization constitutes one of the many interrelated parts typically involved in data analysis and modeling, the present work addressed the important issue of feature normalization from the perspective of uniform and proportional (right skewed) features and comparison operations. More general right skewed features are also considered in an approximated manner. Several concepts, properties, and results are described and discussed, including the description of a duality relationship between uniform and proportional feature spaces and respective comparisons, specifying conditions for consistency between comparisons in each of the two domains. Two normalization possibilities based on non-centralized dispersion of features are also presented, and also described is a modified version of the Jaccard similarity index which incorporates intrinsically normalization. Preliminary experiments are presented in order to illustrate the developed concepts and methods.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Supervised Pattern Recognition Involving Skewed Feature Densities
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to represent the data elements, as well as possible respective transformations. In the present work, the classification potential of the Euclidean distance and a dissimila…
▽ More
Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to represent the data elements, as well as possible respective transformations. In the present work, the classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities. Given two groups characterized by respective densities without or with overlap, different types of respective transformations are obtained and employed to quantitatively evaluate the performance of k-neighbors methodologies based on the Euclidean distance an coincidence similarity index. More specifically, the accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account for the comparison. Several interesting results are described and discussed, including the enhanced potential of the dissimilarity index for classifying datasets with right skewed feature densities, as well as the identification that the sharpness of the comparison between data elements can be independent of the respective supervised classification performance.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Simple Games on Complex Networks
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
The relationship between topology and dynamics of complex systems has motivated continuing interest from the scientific community. In the present work, we address this interesting topic from the perspective of simple games, involving two teams playing according to a small set of simple rules, taking place on four types of complex networks. Starting from a minimalist game, characterized by full sym…
▽ More
The relationship between topology and dynamics of complex systems has motivated continuing interest from the scientific community. In the present work, we address this interesting topic from the perspective of simple games, involving two teams playing according to a small set of simple rules, taking place on four types of complex networks. Starting from a minimalist game, characterized by full symmetry always leading to ties, four other games are described in progressive order of complexity, taking into account the presence of neighbors as well as strategies. Each of these five games, as well as their specific changes when implemented in four types of networks, are studied in terms of statistics of the total duration of the game as well as the number of victories and ties, with several interesting results that substantiate, in some cases, the importance of the network topology on the respective dynamics. As a subsidiary result, the visualization of relationships between the data elements in terms of coincidence similarity networks allowed a more complete and direct interpretation of the obtained results.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Subsuming Complex Networks by Node Walks
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
The concept of node walk in graphs and complex networks has been addressed, consisting of one or more nodes that move into adjacent nodes, henceforth incorporating the respective connections. This type of dynamics is then applied to subsume complex networks. Three types of networks (Erdós- Rény, Barabási-Albert, as well as a geometric model) are considered, while three node walks heuristics (unifo…
▽ More
The concept of node walk in graphs and complex networks has been addressed, consisting of one or more nodes that move into adjacent nodes, henceforth incorporating the respective connections. This type of dynamics is then applied to subsume complex networks. Three types of networks (Erdós- Rény, Barabási-Albert, as well as a geometric model) are considered, while three node walks heuristics (uniformly random, largest degree, and smallest degree) are taken into account. Several interesting results are obtained and described, including the identification that the subsuming dynamics depend strongly on both the specific topology of the networks as well as the criteria controlling the node walks. The use of node walks as a model for studying the relationship between network topology and dynamics is motivated by this result. In addition, relatively high correlations between the initial node degree and the accumulated strength of the walking node were observed for some combinations of network types and dynamic rules, allowing some of the properties of the subsumption to be roughly predicted from the initial topology around the waking node which has been found, however, not to be enough for full determination of the subsumption dynamics. Another interesting result regards the quite distinct signatures (along the iterations) of walking node strengths obtained for the several considered combinations of network type and subsumption rules.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Node Accessibility Characterization of Radially-Grown Structures
Authors:
Alexandre Benatti,
Roberto M. Cesar Jr.,
Luciano da F. Costa
Abstract:
Complex systems have motivated continuing interest from the scientific community, leading to new concepts and methods. Growing systems represent a case of particular interest, as their topological, geometrical, and also dynamical properties change along time, as new elements are incorporated into the existing structure. In the present work, an approach is the case in which systems grown radially a…
▽ More
Complex systems have motivated continuing interest from the scientific community, leading to new concepts and methods. Growing systems represent a case of particular interest, as their topological, geometrical, and also dynamical properties change along time, as new elements are incorporated into the existing structure. In the present work, an approach is the case in which systems grown radially around some straight axis of reference, such as particle deposition on electrodes, or urban expansion along avenues, roads, coastline, or rivers, among several other possibilities. More specifically, we aim at characterizing the topological properties of simulated growing structures, which are represented as graphs, in terms of a measurement corresponding to the accessibility of each involved node. The incorporation of new elements (nodes and links) is performed preferentially to the angular orientation respectively to the reference axis. Several interesting results are reported, including the tendency of structures grown preferentially to the orientation normal to the axis to have smaller accessibility.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Distance-Based Hierarchical Cutting of Complex Networks with Non-Preferential and Preferential Choice of Seeds
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
Graphs and complex networks can be successively separated into connected components associated to respective seed nodes, therefore establishing a respective hierarchical organization. In the present work, we study the properties of the hierarchical structure implied by distance-based cutting of Erdős-Rényi, Barabási-Albert, and a specific geometric network. Two main situations are considered regar…
▽ More
Graphs and complex networks can be successively separated into connected components associated to respective seed nodes, therefore establishing a respective hierarchical organization. In the present work, we study the properties of the hierarchical structure implied by distance-based cutting of Erdős-Rényi, Barabási-Albert, and a specific geometric network. Two main situations are considered regarding the choice of the seeds: non-preferential and preferential to the respective node degree. Among the obtained findings, we have the tendency of geometrical networks yielding more balanced pairs of connected components along the network progressive separation, presenting little chaining effects, followed by the Erdős-Rényi and Barabási-Albert types of networks. The choice of seeds preferential to the node degree tended to enhance the balance of the connected components in the case of the geometrical networks.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Hierarchical Cutting of Complex Networks Performed by Random Walks
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
Several interesting approaches have been reported in the literature on complex networks, random walks, and hierarchy of graphs. While many of these works perform random walks on stable, fixed networks, in the present work we address the situation in which the connections traversed by each step of a uniformly random walks are progressively removed, yielding a successively less interconnected struct…
▽ More
Several interesting approaches have been reported in the literature on complex networks, random walks, and hierarchy of graphs. While many of these works perform random walks on stable, fixed networks, in the present work we address the situation in which the connections traversed by each step of a uniformly random walks are progressively removed, yielding a successively less interconnected structure that may break into two components, therefore establishing a respective hierarchy. The sizes of each of these pairs of sliced networks, as well as the permanence of each connected component, are studied in the present work. Several interesting results are reported, including the tendency of geometrical networks sometimes to be broken into two components with comparable large sizes.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Detecting Groups in Directed and Non-Directed Bipartite Networks
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
Bipartite networks provide an effective resource for representing, characterizing, and modeling several abstract and real-world systems and structures involving binary relations, which include food webs, social interactions, and customer-product relationships. Of particular interest is the problem of, given a specific bipartite network, to identify possible respective groups or clusters characteri…
▽ More
Bipartite networks provide an effective resource for representing, characterizing, and modeling several abstract and real-world systems and structures involving binary relations, which include food webs, social interactions, and customer-product relationships. Of particular interest is the problem of, given a specific bipartite network, to identify possible respective groups or clusters characterized by similar interconnecting patterns. The present work approaches this issue by extending and complementing a previously described coincidence similarity methodology (Bioarxiv, doi.org/10.1101/2022.07.16.500294) in several manners, including the consideration of direct and non-directed bipartite networks, the characterization of groups in those networks, as well as considering synthetic bipartite networks presenting groups as a resource for studying the performance of the described methodology. Several interesting results are described and discussed, including the corroboration of the potential of the coincidence similarity methodology for achieving enhanced separation between the groups in bipartite networks.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Random Walks Performed by Topologically-Specific Agents on Complex Networks
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
Random walks by single-node agents have been systematically conducted on various types of complex networks in order to investigate how their topologies can affect the dynamics of the agents. However, by fitting any network node, these agents do not engage in topological interactions with the network. In the present work, we describe random walks on complex networks performed by agents that are act…
▽ More
Random walks by single-node agents have been systematically conducted on various types of complex networks in order to investigate how their topologies can affect the dynamics of the agents. However, by fitting any network node, these agents do not engage in topological interactions with the network. In the present work, we describe random walks on complex networks performed by agents that are actually small graphs. These agents can only occupy admissible portions of the network onto which they fit topologically, hence their name being taken as topologically-specific agents. These agents are also allowed to move to adjacent subgraphs in the network, which have each node adjacent to a distinct original respective node of the agent. Given a network and a specific agent, it is possible to obtain a respective associated network, in which each node corresponds to a possible instance of the agent and the edges indicate adjacent positions. Associated networks are obtained and studied respectively to three types of topologically-specific agents (triangle, square, and slashed square) considering three types of complex networks (geometrical, Erdős-Rényi, and Barabási-Albert). Uniform random walks are also performed on these structures, as well as networks respectively obtained by removing the five nodes with the highest degree, and studied in terms of the number of covered nodes along the walks. Several results are reported and discussed, including the fact that substantially distinct associated networks can be obtained for each of the three considered agents and for varying average node degrees. Respectively to the coverage of the networks by uniform random walks, the square agent led to the most effective coverage of the nodes, followed by the triangle and slashed square agents. In addition, the geometric network turned out to be less effectively covered.
△ Less
Submitted 15 May, 2025; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Parallel and Sequential Resources Networks
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
A large number of real and abstract systems involve the transformation of some basic resource into respective products under the action of multiple processing agents, which can be understood as multiple-agent production systems (MAP). At each discrete time instant, for each agent, a fraction of the resources is assumed to be kept, forwarded to other agents, or converted into work with some efficie…
▽ More
A large number of real and abstract systems involve the transformation of some basic resource into respective products under the action of multiple processing agents, which can be understood as multiple-agent production systems (MAP). At each discrete time instant, for each agent, a fraction of the resources is assumed to be kept, forwarded to other agents, or converted into work with some efficiency. The present work describes a systematic study of nine basic MAP architectures subdivided into two main groups, namely parallel and sequential distribution of resources from a single respective source. Several types of interconnections among the involved processing agents are also considered. The resulting MAP architectures are studied in terms of the total amount of work, the dispersion of the resources (states) among the agents, and the transition times from the start of operation until the respective steady state. Several interesting results are obtained and discussed, including the observation that some of the parallel designs were able to yield maximum work and minimum state dispersion, achieved at the expense of the transition time and use of several interconnections between the source and the agents. The results obtained for the sequential designs indicate that relatively high performance can be obtained for some specific cases.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Simple Bundles of Complex Networks
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
Complex networks can be used to represent and model an ample diversity of abstract and real-world systems and structures. A good deal of the research on these structures has focused on specific topological properties, including node degree, shortest paths, and modularity. In the present work, we develop an approach aimed at identifying and characterizing simple bundles of interconnections between…
▽ More
Complex networks can be used to represent and model an ample diversity of abstract and real-world systems and structures. A good deal of the research on these structures has focused on specific topological properties, including node degree, shortest paths, and modularity. In the present work, we develop an approach aimed at identifying and characterizing simple bundles of interconnections between pairs of nodes (source and destination) in complex networks. More specifically, simple bundles can be understood as corresponding to the bundle of paths obtained while traveling through successive neighborhoods after departing from a given source node. Because no node appears more than once along a given bundle, these structures have been said to be simple, in analogy to the concept of a simple path. In addition to describing simple bundles and providing a possible methodology for their identification, we also consider how their respective effective width can be estimated in terms of diffusion flow and exponential entropy of transition probabilities. The potential of the concepts and methods described in this work is then illustrated respectively to the characterization and analysis of model-theoretic networks, with several interesting results.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Multilayer Multiset Neuronal Networks -- MMNNs
Authors:
Alexandre Benatti,
Luciano da Fontoura Costa
Abstract:
The coincidence similarity index, based on a combination of the Jaccard and overlap similarity indices, has noticeable properties in comparing and classifying data, including enhanced selectivity and sensitivity, intrinsic normalization, and robustness to data perturbations and outliers. These features allow multiset neurons, which are based on the coincidence similarity operation, to perform effe…
▽ More
The coincidence similarity index, based on a combination of the Jaccard and overlap similarity indices, has noticeable properties in comparing and classifying data, including enhanced selectivity and sensitivity, intrinsic normalization, and robustness to data perturbations and outliers. These features allow multiset neurons, which are based on the coincidence similarity operation, to perform effective pattern recognition applications, including the challenging task of image segmentation. A few prototype points have been used in previous related approaches to represent each pattern to be identified, each of them being associated with respective multiset neurons. The segmentation of the regions can then proceed by taking into account the outputs of these neurons. The present work describes multilayer multiset neuronal networks incorporating two or more layers of coincidence similarity neurons. In addition, as a means to improve performance, this work also explores the utilization of counter-prototype points, which are assigned to the image regions to be avoided. This approach is shown to allow effective segmentation of complex regions despite considering only one prototype and one counter-prototype point. As reported here, the balanced accuracy landscapes to be optimized in order to identify the weight of the neurons in subsequent layers have been found to be relatively smooth, while typically involving more than one attraction basin. The use of a simple gradient-based optimization methodology has been demonstrated to effectively train the considered neural networks with several architectures, at least for the given data type, configuration of parameters, and network architecture.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Two Approaches to Supervised Image Segmentation
Authors:
Alexandre Benatti,
Luciano da F. Costa
Abstract:
Though performed almost effortlessly by humans, segmenting 2D gray-scale or color images into respective regions of interest (e.g.~background, objects, or portions of objects) constitutes one of the greatest challenges in science and technology as a consequence of several effects including dimensionality reduction(3D to 2D), noise, reflections, shades, and occlusions, among many other possibilitie…
▽ More
Though performed almost effortlessly by humans, segmenting 2D gray-scale or color images into respective regions of interest (e.g.~background, objects, or portions of objects) constitutes one of the greatest challenges in science and technology as a consequence of several effects including dimensionality reduction(3D to 2D), noise, reflections, shades, and occlusions, among many other possibilities. While a large number of interesting related approaches have been suggested along the last decades, it was mainly thanks to the recent development of deep learning that more effective and general solutions have been obtained, currently constituting the basic comparison reference for this type of operation. Also developed recently, a multiset-based methodology has been described that is capable of encouraging image segmentation performance combining spatial accuracy, stability, and robustness while requiring little computational resources (hardware and/or training and recognition time). The interesting features of the multiset neurons methodology mostly follow from the enhanced selectivity and sensitivity, as well as good robustness to data perturbations and outliers, allowed by the coincidence similarity index on which the multiset approach to supervised image segmentation is founded. After describing the deep learning and multiset neurons approaches, the present work develops comparison experiments between them which are primarily aimed at illustrating their respective main interesting features when applied to the adopted specific type of data and parameter configurations. While the deep learning approach confirmed its potential for performing image segmentation, the alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
△ Less
Submitted 22 August, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Using Full-Text Content to Characterize and Identify Best Seller Books
Authors:
Giovana D. da Silva,
Filipi N. Silva,
Henrique F. de Arruda,
Bárbara C. e Souza,
Luciano da F. Costa,
Diego R. Amancio
Abstract:
Artistic pieces can be studied from several perspectives, one example being their reception among readers over time. In the present work, we approach this interesting topic from the standpoint of literary works, particularly assessing the task of predicting whether a book will become a best seller. Dissimilarly from previous approaches, we focused on the full content of books and considered visual…
▽ More
Artistic pieces can be studied from several perspectives, one example being their reception among readers over time. In the present work, we approach this interesting topic from the standpoint of literary works, particularly assessing the task of predicting whether a book will become a best seller. Dissimilarly from previous approaches, we focused on the full content of books and considered visualization and classification tasks. We employed visualization for the preliminary exploration of the data structure and properties, involving SemAxis and linear discriminant analyses. Then, to obtain quantitative and more objective results, we employed various classifiers. Such approaches were used along with a dataset containing (i) books published from 1895 to 1924 and consecrated as best sellers by the Publishers Weekly Bestseller Lists and (ii) literary works published in the same period but not being mentioned in that list. Our comparison of methods revealed that the best-achieved result - combining a bag-of-words representation with a logistic regression classifier - led to an average accuracy of 0.75 both for the leave-one-out and 10-fold cross-validations. Such an outcome suggests that it is unfeasible to predict the success of books with high accuracy using only the full content of the texts. Nevertheless, our findings provide insights into the factors leading to the relative success of a literary work.
△ Less
Submitted 11 May, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Text characterization based on recurrence networks
Authors:
Bárbara C. e Souza,
Filipi N. Silva,
Henrique F. de Arruda,
Giovana D. da Silva,
Luciano da F. Costa,
Diego R. Amancio
Abstract:
Several complex systems are characterized by presenting intricate characteristics taking place at several scales of time and space. These multiscale characterizations are used in various applications, including better understanding diseases, characterizing transportation systems, and comparison between cities, among others. In particular, texts are also characterized by a hierarchical structure th…
▽ More
Several complex systems are characterized by presenting intricate characteristics taking place at several scales of time and space. These multiscale characterizations are used in various applications, including better understanding diseases, characterizing transportation systems, and comparison between cities, among others. In particular, texts are also characterized by a hierarchical structure that can be approached by using multi-scale concepts and methods. The multiscale properties of texts constitute a subject worth further investigation. In addition, more effective approaches to text characterization and analysis can be obtained by emphasizing words with potentially more informational content. The present work aims at developing these possibilities while focusing on mesoscopic representations of networks. More specifically, we adopt an extension to the mesoscopic approach to represent text narratives, in which only the recurrent relationships among tagged parts of speech (subject, verb and direct object) are considered to establish connections among sequential pieces of text (e.g., paragraphs). The characterization of the texts was then achieved by considering scale-dependent complementary methods: accessibility, symmetry and recurrence signatures. In order to evaluate the potential of these concepts and methods, we approached the problem of distinguishing between literary genres (fiction and non-fiction). A set of 300 books organized into the two genres was considered and were compared by using the aforementioned approaches. All the methods were capable of differentiating to some extent between the two genres. The accessibility and symmetry reflected the narrative asymmetries, while the recurrence signature provided a more direct indication about the non-sequential semantic connections taking place along the narrative.
△ Less
Submitted 2 May, 2022; v1 submitted 17 January, 2022;
originally announced January 2022.
-
The Classic Cross-Correlation and the Real-Valued Jaccard and Coincidence Indices
Authors:
Luciano da F. Costa
Abstract:
In this work we describe and compare the classic inner product and Pearson correlation coefficient as well as the recently introduced real-valued Jaccard and coincidence indices. Special attention is given to diverse schemes for taking into account the signs of the operands, as well as on the study of the geometry of the scalar field surface related to the generalized multiset binary operations un…
▽ More
In this work we describe and compare the classic inner product and Pearson correlation coefficient as well as the recently introduced real-valued Jaccard and coincidence indices. Special attention is given to diverse schemes for taking into account the signs of the operands, as well as on the study of the geometry of the scalar field surface related to the generalized multiset binary operations underling the considered similarity indices. The possibility to split the classic inner product, cross-correlation, and Pearson correlation coefficient is also described.
△ Less
Submitted 25 November, 2021;
originally announced December 2021.
-
Multiset Neurons
Authors:
Luciano da F. Costa
Abstract:
The present work reports a comparative performance of artificial neurons obtained in terms of the real-valued Jaccard and coincidence similarity indices and respectively derived functionals. The interiority index and classic cross-correlation are also included for comparison purposes. After presenting the basic concepts related to real-valued multisets and the adopted similarity metrics, including…
▽ More
The present work reports a comparative performance of artificial neurons obtained in terms of the real-valued Jaccard and coincidence similarity indices and respectively derived functionals. The interiority index and classic cross-correlation are also included for comparison purposes. After presenting the basic concepts related to real-valued multisets and the adopted similarity metrics, including the generalization of the real-valued Jaccard and coincidence indices to higher orders, we proceed to studying the response of a single neuron, not taking into account the output non-linearity (e.g.~sigmoid), respectively to the detection of gaussian two-dimensional stimulus in presence of displacement, magnification, intensity variation, noise and interference from additional patterns. It is shown that the real-valued Jaccard and coincidence approaches are substantially more robust and effective than the interiority index and the classic cross-correlation. The coincidence-based neurons are shown to have the best overall performance respectively to the considered type of data and perturbations. The potential of the multiset neurons is further illustrated with respect to the challenging problem of image segmentation, leading to impressive cost/benefit performance. The reported concepts, methods, and results, have substantial implications not only for pattern recognition and machine learning, but also regarding neurobiology and neuroscience.
△ Less
Submitted 23 April, 2022; v1 submitted 13 November, 2021;
originally announced November 2021.
-
Multiset Signal Processing and Electronics
Authors:
Luciano da F. Costa
Abstract:
Multisets are an intuitive extension of the traditional concept of sets that allow repetition of elements, with the number of times each element appears being understood as the respective multiplicity. Recent generalizations of multisets to real-valued functions, accounting for possibly negative values, have paved the way to a number of interesting implications and applications, including respecti…
▽ More
Multisets are an intuitive extension of the traditional concept of sets that allow repetition of elements, with the number of times each element appears being understood as the respective multiplicity. Recent generalizations of multisets to real-valued functions, accounting for possibly negative values, have paved the way to a number of interesting implications and applications, including respective implementations as electronic systems. The basic multiset operations include the set complementation (sign change), intersection (minimum between two values), union (maximum between two values), difference and sum (identical to the algebraic counterparts). When applied to functions or signals, the sign and conjoint sign functions are also required. Given that signals are functions, it becomes possible to effectively translate the multiset and multifunction operations to analog electronics, which is the objective of the present work. It is proposed that effective multiset operations capable of high performance self and cross-correlation can be obtained with relative simplicity in either discrete or integrated circuits. The problem of switching noise is also briefly discussed. The present results have great potential for applications and related developments in analog and digital electronics, as well as for pattern recognition, signal processing, and deep learning.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Comparing Cross Correlation-Based Similarities
Authors:
Luciano da F. Costa
Abstract:
The real-valued Jaccard and coincidence indices, in addition to their conceptual and computational simplicity, have been verified to be able to provide promising results in tasks such as template matching, tending to yield peaks that are sharper and narrower than those typically obtained by standard cross-correlation, while also attenuating substantially secondary matchings. In this work, the mult…
▽ More
The real-valued Jaccard and coincidence indices, in addition to their conceptual and computational simplicity, have been verified to be able to provide promising results in tasks such as template matching, tending to yield peaks that are sharper and narrower than those typically obtained by standard cross-correlation, while also attenuating substantially secondary matchings. In this work, the multiset-based correlations based on the real-valued multiset Jaccard and coincidence indices are compared from the perspective of template matching, with encouraging results which have implications for pattern recognition, deep learning, and scientific modeling in general. The multiset-based correlation methods, and especially the coincidence index, presented remarkable performance characterized by sharper and narrower peaks while secondary peaks were attenuated, which was maintained even in presence of intense levels of noise. In particular, the two methods derived from the coincidence index led to particularly interesting results. The cross correlation, however, presented the best robustness to symmetric additive noise, which suggested a new combination of the considered approaches. After a preliminary investigation of the relative performance of the multiset approaches, as well as the classic cross-correlation, a systematic comparison framework is proposed and applied for the study of the aforementioned methods. Several results are reported, including the confirmation, at least for the considered type of data, of the coincidence correlation as providing enhanced performance regarding detection of narrow, sharp peaks while secondary matches are duly attenuated. The combined method also resulted promising for dealing with signals in presence of intense additive noise.
△ Less
Submitted 21 November, 2021; v1 submitted 8 November, 2021;
originally announced November 2021.
-
On Similarity
Authors:
Luciano da F. Costa
Abstract:
The objective quantification of similarity between two mathematical structures constitutes a recurrent issue in science and technology. In the present work, we developed a principled approach that took the Kronecker's delta function of two scalar values as the prototypical reference for similarity quantification and then derived for more yielding indices, three of which bound between 0 and 1. Gene…
▽ More
The objective quantification of similarity between two mathematical structures constitutes a recurrent issue in science and technology. In the present work, we developed a principled approach that took the Kronecker's delta function of two scalar values as the prototypical reference for similarity quantification and then derived for more yielding indices, three of which bound between 0 and 1. Generalizations of these indices to take into account the sign of the scalar values were then presented and developed to multisets, vectors, and functions in real spaces. Several important results have been obtained, including the interpretation of the Jaccard index as a yielding implementation of the Kronecker's delta function. When generalized to real functions, the four described similarity indices become respective functionals, which can then be employed to obtain associated operations of convolution and correlation.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Further Generalizations of the Jaccard Index
Authors:
Luciano da F. Costa
Abstract:
Quantifying the similarity between two mathematical structures or datasets constitutes a particularly interesting and useful operation in several theoretical and applied problems. Aimed at this specific objective, the Jaccard index has been extensively used in the most diverse types of problems, also motivating some respective generalizations. The present work addresses further generalizations of…
▽ More
Quantifying the similarity between two mathematical structures or datasets constitutes a particularly interesting and useful operation in several theoretical and applied problems. Aimed at this specific objective, the Jaccard index has been extensively used in the most diverse types of problems, also motivating some respective generalizations. The present work addresses further generalizations of this index, including its modification into a coincidence index capable of accounting also for the level of relative interiority between the two compared entities, as well as respective extensions for sets in continuous vector spaces, the generalization to multiset addition, densities and generic scalar fields, as well as a means to quantify the joint interdependence between two random variables. The also interesting possibility to take into account more than two sets has also been addressed, including the description of an index capable of quantifying the level of chaining between three structures. Several of the described and suggested eneralizations have been illustrated with respect to numeric case examples. It is also posited that these indices can play an important role while analyzing and integrating datasets in modeling approaches and pattern recognition activities, including as a measurement of clusters similarity or separation and as a resource for representing and analyzing complex networks.
△ Less
Submitted 18 November, 2021; v1 submitted 18 October, 2021;
originally announced October 2021.
-
An Ample Approach to Data and Modeling
Authors:
Luciano da F. Costa
Abstract:
In the present work, we describe a framework for modeling how models can be built that integrates concepts and methods from a wide range of fields. The information schism between the real-world and that which can be gathered and considered by any individual information processing agent is characterized and discussed, followed by the presentation of a series of the adopted requisites while developi…
▽ More
In the present work, we describe a framework for modeling how models can be built that integrates concepts and methods from a wide range of fields. The information schism between the real-world and that which can be gathered and considered by any individual information processing agent is characterized and discussed, followed by the presentation of a series of the adopted requisites while developing the modeling approach. The issue of mapping from datasets into models is subsequently addressed, as well as some of the respectively implied difficulties and limitations. Based on these considerations, an approach to meta modeling how models are built is then progressively developed. First, the reference M* meta model framework is presented, which relies critically in associating whole datasets and respective models in terms of a strict equivalence relation. Among the interesting features of this model are its ability to bridge the gap between data and modeling, as well as paving the way to an algebra of both data and models which can be employed to combine models into hierarchical manner. After illustrating the M* model in terms of patterns derived from regular lattices, the reported modeling approach continues by discussing how sampling issues, error and overlooked data can be addressed, leading to the $M^{<ε>}$ variant, illustrated respectively to number theory. The situation in which the data needs to be represented in terms of respective probability densities is treated next, yielding the $M^{<σ>}$ meta model, which is then illustrated respectively to a real-world dataset (iris flowers data). Several considerations about how the developed framework can provide insights about data clustering, complexity, collaborative research, deep learning, and creativity are then presented, followed by overall conclusions.
△ Less
Submitted 12 October, 2021; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Unraveling the graph structure of tabular data through Bayesian and spectral analysis
Authors:
Bruno Messias F. de Resende,
Eric K. Tokuda,
Luciano da Fontoura Costa
Abstract:
In the big-data age, tabular data are being generated and analyzed everywhere. As a consequence, finding and understanding the relationships between the features in these data are of great relevance. Here, to encompass these relationships, we propose a graph-based method that allows individual, group and multi-scale analyses. The method starts by mapping the tabular data into a weighted directed g…
▽ More
In the big-data age, tabular data are being generated and analyzed everywhere. As a consequence, finding and understanding the relationships between the features in these data are of great relevance. Here, to encompass these relationships, we propose a graph-based method that allows individual, group and multi-scale analyses. The method starts by mapping the tabular data into a weighted directed graph using the Shapley additive explanations technique. With this graph of relationships, we show that the inference of the hierarchical modular structure obtained by the Nested Stochastic Block Model (nSBM) as well as the study of the spectral space of the magnetic Laplacian can help us identify the classes of features and unravel non-trivial relationships. As a case study, we analyzed a socioeconomic survey conducted with students in Brazil: the PeNSE survey. The spectral embedding of the columns suggested that questions related to physical activities form a separate group. The application of the nSBM approach not only corroborated with that but allowed complementary findings about the modular structure: some groups of questions showed a high adherence with the divisions qualitatively defined by the designers of the survey. As opposed to the structure obtained by the spectrum, questions from the class Safety were partly grouped by our method in the class Drugs. Surprisingly, by inspecting these questions, we observed that they were related to both these topics, suggesting an alternative interpretation of these questions. These results show how our method can provide guidance for tabular data analysis as well as the design of future surveys.
△ Less
Submitted 7 January, 2023; v1 submitted 4 October, 2021;
originally announced October 2021.
-
A pattern recognition approach for distinguishing between prose and poetry
Authors:
Henrique F. de Arruda,
Sandro M. Reia,
Filipi N. Silva,
Diego R. Amancio,
Luciano da F. Costa
Abstract:
Poetry and prose are written artistic expressions that help us to appreciate the reality we live. Each of these styles has its own set of subjective properties, such as rhyme and rhythm, which are easily caught by a human reader's eye and ear. With the recent advances in artificial intelligence, the gap between humans and machines may have decreased, and today we observe algorithms mastering tasks…
▽ More
Poetry and prose are written artistic expressions that help us to appreciate the reality we live. Each of these styles has its own set of subjective properties, such as rhyme and rhythm, which are easily caught by a human reader's eye and ear. With the recent advances in artificial intelligence, the gap between humans and machines may have decreased, and today we observe algorithms mastering tasks that were once exclusively performed by humans. In this paper, we propose an automated method to distinguish between poetry and prose based solely on aural and rhythmic properties. In other to compare prose and poetry rhythms, we represent the rhymes and phones as temporal sequences and thus we propose a procedure for extracting rhythmic features from these sequences. The classification of the considered texts using the set of features extracted resulted in a best accuracy of 0.78, obtained with a neural network. Interestingly, by using an approach based on complex networks to visualize the similarities between the different texts considered, we found that the patterns of poetry vary much more than prose. Consequently, a much richer and complex set of rhythmic possibilities tends to be found in that modality.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
A keyword-driven approach to science
Authors:
Henrique Ferraz de Arruda,
Luciano da Fontoura Costa
Abstract:
To a good extent, words can be understood as corresponding to patterns or categories that appeared in order to represent concepts and structures that are particularly important or useful in a given time and space. Words are characterized by not being completely general nor specific, in the sense that the same word can be instantiated or related to several different contexts, depending on specific…
▽ More
To a good extent, words can be understood as corresponding to patterns or categories that appeared in order to represent concepts and structures that are particularly important or useful in a given time and space. Words are characterized by not being completely general nor specific, in the sense that the same word can be instantiated or related to several different contexts, depending on specific situations. Indeed, the way in which words are instantiated and associated represents a particularly interesting aspect that can substantially help to better understand the context in which they are employed. Scientific words are no exception to that. In the present work, we approach the associations between a set of particularly relevant words in the sense of being not only frequently used in several areas, but also representing concepts that are currently related to some of the main standing challenges in science. More specifically, the study reported here takes into account the words "prediction", "model", "optimization", "complex", "entropy", "random", "deterministic", "pattern", and "database". In order to complement the analysis, we also obtain a network representing the relationship between the adopted areas. Many interesting results were found. First and foremost, several of the words were observed to have markedly distinct associations in different areas. Biology was found to be related to computer science, sharing associations with databases. Furthermore, for most of the cases, the words "complex", "model", and "prediction" were observed to have several strong associations.
△ Less
Submitted 19 July, 2021; v1 submitted 31 May, 2021;
originally announced June 2021.
-
On the Stability of Citation Networks
Authors:
Alexandre Benatti,
Henrique Ferraz de Arruda,
Filipi Nascimento Silva,
César H. Comin,
Luciano da Fontoura Costa
Abstract:
Citation networks can reveal many important information regarding the development of science and the relationship between different areas of knowledge. Thus, many studies have analyzed the topological properties of such networks. Frequently, citation networks are created using articles acquired from a set of relevant keywords or queries. Here, we study the robustness of citation networks with rega…
▽ More
Citation networks can reveal many important information regarding the development of science and the relationship between different areas of knowledge. Thus, many studies have analyzed the topological properties of such networks. Frequently, citation networks are created using articles acquired from a set of relevant keywords or queries. Here, we study the robustness of citation networks with regards to the keywords that were used for collecting the respective articles. A perturbation approach is proposed, in which the influence of missing keywords on the topology and community structure of citation networks is quantified. In addition, the relationship between keywords and the community structure of citation networks is studied using networks generated from a simple model. We find that, owing to its highly modular structure, the community structure of citation networks tends to be preserved even when many relevant keywords are left out. Furthermore, the proposed model can reflect the impact of missing keywords on different situations.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Complex Networks of Functions
Authors:
Luciano da F. Costa
Abstract:
Functions correspond to one of the key concepts in mathematics and science, allowing the representation and modeling of several types of signals and systems. The present work develops an approach for characterizing the coverage and interrelationship between discrete signals that can be fitted by a set of reference functions, allowing the definition of transition networks between the considered dis…
▽ More
Functions correspond to one of the key concepts in mathematics and science, allowing the representation and modeling of several types of signals and systems. The present work develops an approach for characterizing the coverage and interrelationship between discrete signals that can be fitted by a set of reference functions, allowing the definition of transition networks between the considered discrete signals. While the adjacency between discrete signals is defined in terms of respective Euclidean distances, the property of being adjustable by the reference functions provides an additional constraint leading to a surprisingly diversity of transition networks topologies. First, we motivate the possibility to define transitions between parametric continuous functions, a concept that is subsequently extended to discrete functions and signals. Given that the set of all possible discrete signals in a bound region corresponds to a finite number of cases, it becomes feasible to verify the adherence of each of these signals with respect to a reference set of functions. Then, by taking into account also the Euclidean proximity between those discrete signals found to be adjustable, it becomes possible to obtain a respective transition network that can be not only used to study the properties and interrelationships of the involved discrete signals as underlain by the reference functions, but which also provide an interesting complex network theoretical model on itself, presenting a surprising diversity of topological features, including modular organization coexisting with more uniform portions, tails and handles, as well as hubs. Examples of the proposed concepts and methodologies are provided respectively with respect to three case examples involving power, sinusoidal and polynomial functions.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Modeling how social network algorithms can influence opinion polarization
Authors:
Henrique F. de Arruda,
Felipe M. Cardoso,
Guilherme F. de Arruda,
Alexis R. Hernández,
Luciano da F. Costa,
Yamir Moreno
Abstract:
Among different aspects of social networks, dynamics have been proposed to simulate how opinions can be transmitted. In this study, we propose a model that simulates the communication in an online social network, in which the posts are created from external information. We considered the nodes and edges of a network as users and their friendship, respectively. A real number is associated with each…
▽ More
Among different aspects of social networks, dynamics have been proposed to simulate how opinions can be transmitted. In this study, we propose a model that simulates the communication in an online social network, in which the posts are created from external information. We considered the nodes and edges of a network as users and their friendship, respectively. A real number is associated with each user representing its opinion. The dynamics starts with a user that has contact with a random opinion, and, according to a given probability function, this individual can post this opinion. This step is henceforth called post transmission. In the next step, called post distribution, another probability function is employed to select the user's friends that could see the post. Post transmission and distribution represent the user and the social network algorithm, respectively. If an individual has contact with a post, its opinion can be attracted or repulsed. Furthermore, individuals that are repulsed can change their friendship through a rewiring. These steps are executed various times until the dynamics converge. Several impressive results were obtained, which include the formation of scenarios of polarization and consensus of opinions. In the case of echo chambers, the possibility of rewiring probability is found to be decisive. However, for particular network topologies, with a well-defined community structure, this effect can also happen. All in all, the results indicate that the post distribution strategy is crucial to mitigate or promote polarization.
△ Less
Submitted 29 January, 2021;
originally announced February 2021.
-
Transistors: A Network Science-Based Historical Perspective
Authors:
Alexandre Benatti,
Henrique Ferraz de Arruda,
Filipi Nascimento Silva,
Luciano da Fontoura Costa
Abstract:
The development of modern electronics was to a large extent related to the advent and popularization of bipolar junction technology. The present work applies science of science concepts and methodologies in order to develop a relatively systematic, quantitative study of the development of electronics from a bipolar-junction-centered perspective. First, we searched the adopted dataset (Microsoft Ac…
▽ More
The development of modern electronics was to a large extent related to the advent and popularization of bipolar junction technology. The present work applies science of science concepts and methodologies in order to develop a relatively systematic, quantitative study of the development of electronics from a bipolar-junction-centered perspective. First, we searched the adopted dataset (Microsoft Academic Graph) for entries related to "bipolar junction transistor". Community detection was then applied in order to derive sub-areas, which were tentatively labeled into 10 overall groups. This modular graph was then studied from several perspectives, including topological measurements and time evolution. A number of interesting results are reported, including a good level of thematic coherence within each identified area, as well as the identification of distinct periods along the time evolution including the onset and coming of age of bipolater junction technology and related areas. A particularly surprising result was the verification of stable interrelationship between the identified areas along time.
△ Less
Submitted 18 August, 2020; v1 submitted 6 August, 2020;
originally announced August 2020.
-
Characterization and comparison of large directed graphs through the spectra of the magnetic Laplacian
Authors:
Bruno Messias F. de Resende,
Luciano da F. Costa
Abstract:
In this paper we investigated the possibility to use the magnetic Laplacian to characterize directed graphs (a.k.a. networks). Many interesting results are obtained, including the finding that community structure is related to rotational symmetry in the spectral measurements for a type of stochastic block model. Due the hermiticity property of the magnetic Laplacian we show here how to scale our a…
▽ More
In this paper we investigated the possibility to use the magnetic Laplacian to characterize directed graphs (a.k.a. networks). Many interesting results are obtained, including the finding that community structure is related to rotational symmetry in the spectral measurements for a type of stochastic block model. Due the hermiticity property of the magnetic Laplacian we show here how to scale our approach to larger networks containing hundreds of thousands of nodes using the Kernel Polynomial Method (KPM). We also propose to combine the KPM with the Wasserstein metric in order to measure distances between networks even when these networks are directed, large and have different sizes, a hard problem which cannot be tackled by previous methods presented in the literature. In addition, our python package is publicly available at \href{https://github.com/stdogpkg/emate}{github.com/stdogpkg/emate}. The codes can run in both CPU and GPU and can estimate the spectral density and related trace functions, such as entropy and Estrada index, even in directed or undirected networks with million of nodes.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Revisiting Agglomerative Clustering
Authors:
Eric K. Tokuda,
Cesar H. Comin,
Luciano da F. Costa
Abstract:
An important issue in clustering concerns the avoidance of false positives while searching for clusters. This work addressed this problem considering agglomerative methods, namely single, average, median, complete, centroid and Ward's approaches applied to unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions. A model of clusters was also adopted, involvi…
▽ More
An important issue in clustering concerns the avoidance of false positives while searching for clusters. This work addressed this problem considering agglomerative methods, namely single, average, median, complete, centroid and Ward's approaches applied to unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions. A model of clusters was also adopted, involving a higher density nucleus surrounded by a transition, followed by outliers. This paved the way to defining an objective means for identifying the clusters from dendrograms. The adopted model also allowed the relevance of the clusters to be quantified in terms of the height of their subtrees. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives. Also, several methods detected clusters not corresponding directly to the nucleus. The possibility of identifying the type of distribution was also investigated.
△ Less
Submitted 26 June, 2020; v1 submitted 16 May, 2020;
originally announced May 2020.
-
Classification of abrupt changes along viewing profiles of scientific articles
Authors:
Ana C. M. Brito,
Filipi N. Silva,
Henrique F. de Arruda,
Cesar H. Comin,
Diego R. Amancio,
Luciano da F. Costa
Abstract:
With the expansion of electronic publishing, a new dynamics of scientific articles dissemination was initiated. Nowadays, many works are widely disseminated even before publication, in the form of preprints. Another important new element concerns the views of published articles. Thanks to the availability of respective data by some journals, such as PLoS ONE, it became possible to develop investig…
▽ More
With the expansion of electronic publishing, a new dynamics of scientific articles dissemination was initiated. Nowadays, many works are widely disseminated even before publication, in the form of preprints. Another important new element concerns the views of published articles. Thanks to the availability of respective data by some journals, such as PLoS ONE, it became possible to develop investigations on how scientific works are viewed along time, often before the first citations appear. This provides the main theme of the present work. More specifically, our research was motivated by preliminary observations that the view profiles along time tend to present a piecewise linear nature. A methodology was then delineated in order to identify the main segments in the view profiles, which allowed several related measurements to be derived. In particular, we focused on the inclination and length of each subsequent segment. Basic statistics indicated that the inclination can vary substantially along subsequent segments, while the segment lengths resulted more stable. Complementary joint statistics analysis, considering pairwise correlations, provided further information about the properties of the views. In order to better understand the view profiles, we performed respective multivariate statistical analysis, including principal component analysis and hierarchical clustering. The results suggest that a portion of the polygonal views are organized into clusters or groups. These groups were characterized in terms of prototypes indicating the relative increase or decrease along subsequent segments. Four respective distinct models were then developed for representing the observed segments. It was found that models incorporating joint dependencies between the properties of the segments provided the most accurate results among the considered alternatives.
△ Less
Submitted 8 October, 2020; v1 submitted 9 May, 2020;
originally announced May 2020.
-
Shortest Paths in Complex Networks: Structure and Optimization
Authors:
Guilherme S. Domingues,
Cesar H. Comin,
Luciano da F. Costa
Abstract:
Among the several topological properties of complex networks, the shortest path represents a particularly important characteristic because of its potential impact not only on other topological properties, but mainly for its influence on several dynamical processes taking place on the network. In addition, several practical situations, such as transit in cities, can benefit by modifying a network s…
▽ More
Among the several topological properties of complex networks, the shortest path represents a particularly important characteristic because of its potential impact not only on other topological properties, but mainly for its influence on several dynamical processes taking place on the network. In addition, several practical situations, such as transit in cities, can benefit by modifying a network so as to reduce the respective shortest paths. In the present work, we addressed the problem of trying to reduce the average shortest path of several theoretical and real-world complex networks by adding a given number of links according to different strategies. More specifically, we considered: placing new links between nodes with relatively low and high degrees; to enhance the degree regularity of the network; preferential attachment according to the degree; linking nodes with relatively low and high betweenness centrality; and linking nodes with relatively low/low, low/high, and high/high accessibilities. Several interesting results have been obtained, including the identification of the accessibility-based strategies as providing the largest reduction of the average shortest path length. Another interesting finding is that, for several types of networks, the degree-based methods tend to provide improvements comparable to those obtained by using the much more computationally expensive betweenness centrality measurement.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
Toward Generalized Clustering through an One-Dimensional Approach
Authors:
Luciano da F. Costa
Abstract:
After generalizing the concept of clusters to incorporate clusters that are linked to other clusters through some relatively narrow bridges, an approach for detecting patches of separation between these clusters is developed based on an agglomerative clustering, more specifically the single-linkage, applied to one-dimensional slices obtained from respective feature spaces. The potential of this me…
▽ More
After generalizing the concept of clusters to incorporate clusters that are linked to other clusters through some relatively narrow bridges, an approach for detecting patches of separation between these clusters is developed based on an agglomerative clustering, more specifically the single-linkage, applied to one-dimensional slices obtained from respective feature spaces. The potential of this method is illustrated with respect to the analyses of clusterless uniform and normal distributions of points, as well as a one-dimensional clustering model characterized by two intervals with high density of points separated by a less dense interstice. This partial clustering method is then considered as a means of feature selection and cluster identification, and two simple but potentially effective respective methods are described and illustrated with respect to some hypothetical situations.
△ Less
Submitted 1 January, 2020;
originally announced January 2020.
-
Distance-Based Network Partitioning
Authors:
Paulo J. P. de Souza,
Cesar H. Comin,
Luciano da F. Costa
Abstract:
A new method for identifying communities in networks is proposed. Reference nodes, either selected using a priory information about the network or according to relevant node measurements, are obtained so as to indicate putative communities. Distance vectors between each network node and the reference nodes are then used for defining a coordinate system representing the community structure of the n…
▽ More
A new method for identifying communities in networks is proposed. Reference nodes, either selected using a priory information about the network or according to relevant node measurements, are obtained so as to indicate putative communities. Distance vectors between each network node and the reference nodes are then used for defining a coordinate system representing the community structure of the network at many different scales. For modular networks, the distribution of nodes in this space often results in a well-separated clustered structure, with each cluster corresponding to a community. One interesting feature of the reported methodology for community finding is that the coordinate system defined by the seeds allows an intuitive and direct interpretation of the situation of each node with respect to the considered communities. The potential of the method is illustrated with respect to a community detection benchmark, a spatial network model and to city streets networks.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
How Coupled are Mass Spectrometry and Capillary Electrophoresis?
Authors:
Caroline Ceribeli,
Henrique F. de Arruda,
Luciano da F. Costa
Abstract:
The understanding of how science works can contribute to making scientific development more effective. In this paper, we report an analysis of the organization and interconnection between two important issues in chemistry, namely mass spectrometry (MS) and capillary electrophoresis (CE). For that purpose, we employed science of science techniques based on complex networks. More specifically, we co…
▽ More
The understanding of how science works can contribute to making scientific development more effective. In this paper, we report an analysis of the organization and interconnection between two important issues in chemistry, namely mass spectrometry (MS) and capillary electrophoresis (CE). For that purpose, we employed science of science techniques based on complex networks. More specifically, we considered a citation network in which the nodes and connections represent papers and citations, respectively. Interesting results were found, including a good separation between some clusters of articles devoted to instrumentation techniques and applications. However, the papers that describe CE-MS did not lead to a well-defined cluster. In order to better understand the organization of the citation network, we considered a multi-scale analysis, in which we used the information regarding sub-clusters. Firstly, we analyzed the sub-cluster of the first article devoted to the coupling between CE and MS, which was found to be a good representation of its sub-cluster. The second analysis was about the sub-cluster of a seminal paper known to be the first that dealt with proteins by using CE-MS. By considering the proposed methodologies, our paper paves the way for researchers working with both techniques, since it elucidates the knowledge organization and can therefore lead to better literature reviews.
△ Less
Submitted 18 October, 2019;
originally announced October 2019.
-
Syntonets: Toward A Harmony-Inspired General Model of Complex Networks
Authors:
Luciano da Fontoura Costa,
Henrique Ferraz de Arruda
Abstract:
We report an approach to obtaining complex networks with diverse topology, here called syntonets, taking into account the consonances and dissonances between notes as defined by scale temperaments. Though the fundamental frequency is usually considered, in real-world sounds several additional frequencies (partials) accompany the respective fundamental, influencing both timber and consonance betwee…
▽ More
We report an approach to obtaining complex networks with diverse topology, here called syntonets, taking into account the consonances and dissonances between notes as defined by scale temperaments. Though the fundamental frequency is usually considered, in real-world sounds several additional frequencies (partials) accompany the respective fundamental, influencing both timber and consonance between simultaneous notes. We use a method based on Helmholtz's consonance approach to quantify the consonances and dissonances between each of the pairs of notes in a given temperament. We adopt two distinct partials structures: (i) harmonic; and (ii) shifted, obtained by taking the harmonic components to a given power $β$, which is henceforth called the anharmonicity index. The latter type of sounds is more realistic in the sense that they reflect non-linearities implied by real-world instruments. When these consonances/dissonances are estimated along several octaves, respective syntonets can be obtained, in which nodes and weighted edge represent notes, and consonance/dissonance, respectively. The obtained results are organized into two main groups, those related to network science and musical theory. Regarding the former group, we have that the syntonets can provide, for varying values of $β$, a wide range of topologies spanning the space comprised between traditional models. Indeed, it is suggested here that syntony may provide a kind of universal complex network model. The musical interpretations of the results include the confirmation of the more regular consonance pattern of the equal temperament, obtained at the expense of a wider range of consonances such as that in the meantone temperament. We also have that scales derived for shifted partials tend to have a wider range of consonances/dissonances, depending on the temperament and anharmonicity strength.
△ Less
Submitted 11 May, 2020; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Contrarian effects and echo chamber formation in opinion dynamics
Authors:
Henrique Ferraz de Arruda,
Alexandre Benatti,
Filipi Nascimento Silva,
Cesar Henrique Comin,
Luciano da Fontoura Costa
Abstract:
The relationship between the topology of a network and specific types of dynamics unfolding in networks constitutes a subject of substantial interest. One type of dynamics that has attracted increasing attention because of its several potential implications is opinion formation. A phenomenon of particular importance, known to take place in opinion formation, is echo chambers' appearance. In the pr…
▽ More
The relationship between the topology of a network and specific types of dynamics unfolding in networks constitutes a subject of substantial interest. One type of dynamics that has attracted increasing attention because of its several potential implications is opinion formation. A phenomenon of particular importance, known to take place in opinion formation, is echo chambers' appearance. In the present work, we approach this phenomenon, while emphasizing the influence of contrarian opinions in a multi-opinion scenario. To define the contrarian opinion, we considered the Underdog effect, which is the eventual tendency of people to support the less popular option. We also considered an adaptation of the Sznajd dynamics with the possibility of friendship rewiring, performed on several network models. We analyze the relationship between topology and opinion dynamics by considering two measurements: opinion diversity and network modularity. Two specific situations have been addressed: (i) the agents can reconnect only with others sharing the same opinion; and (ii) same as in the previous case, but with the agents reconnecting only within a limited neighborhood. This choice can be justified because, in general, friendship is a transitive property along with subsequent neighborhoods (e.g., two friends of a person tend to know each other). As the main results, we found that the Underdog effect, if strong enough, can balance the agents' opinions. On the other hand, this effect decreases the possibilities of echo-chamber formation. We also found that the restricted reconnection case reduced the chances of echo chamber formation and led to smaller echo chambers.
△ Less
Submitted 11 November, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Modeling Consonance and its Relationships with Temperament, Harmony, and Electronic Amplification
Authors:
Luciano da Fontoura Costa
Abstract:
After briefly revising the concepts of consonance/dissonance, a respective mathematic-computational model is described, based on Helmholtz's consonance theory and also considering the partials intensity. It is then applied to characterize five scale temperaments, as well as some minor and major triads and electronic amplification. In spite of the simplicity of the described model, a surprising agr…
▽ More
After briefly revising the concepts of consonance/dissonance, a respective mathematic-computational model is described, based on Helmholtz's consonance theory and also considering the partials intensity. It is then applied to characterize five scale temperaments, as well as some minor and major triads and electronic amplification. In spite of the simplicity of the described model, a surprising agreement is often observed between the obtained consonances/dissonances and the typically observed properties of scales and chords. The representation of temperaments as graphs where links correspond to consonance (or dissonance) is presented and used to compare distinct temperaments, allowing the identification of two main groups of scales. The interesting issue of nonlinearities in electronic music amplification is also addressed while considering quadratic distortions, and it is shown that such nonlinearities can have drastic effect in changing the original patterns of consonance and dissonance.
△ Less
Submitted 15 June, 2019;
originally announced June 2019.
-
Cost-Based Approach to Complexity: A Common Denominator?
Authors:
Luciano da F. Costa,
Guilherme S. Domingues
Abstract:
Complexity remains one of the central challenges in science and technology. Although several approaches at defining and/or quantifying complexity have been proposed, at some point each of them seems to run into intrinsic limitations or mutual disagreement. Two are the main objectives of the present work: (i) to review some of the main approaches to complexity; and (ii) to suggest a cost-based appr…
▽ More
Complexity remains one of the central challenges in science and technology. Although several approaches at defining and/or quantifying complexity have been proposed, at some point each of them seems to run into intrinsic limitations or mutual disagreement. Two are the main objectives of the present work: (i) to review some of the main approaches to complexity; and (ii) to suggest a cost-based approach that, to a great extent, can be understood as an integration of the several facets of complexity while keeping its meaning for humans in mind. More specifically, it is poised that complexity, an inherently relative and subjective concept, can be summarized as the cost of developing a model, plus the cost of its respective operation. As a consequence, complexity can vary along time and space. The proposal is illustrated respectively to several applications examples, including a real-data base situation.
△ Less
Submitted 4 October, 2021; v1 submitted 11 May, 2019;
originally announced May 2019.
-
Interdisciplinary Relationships Between Biological and Physical Sciences
Authors:
Paulo E. P. Burke,
Luciano da F. Costa
Abstract:
Several interdisciplinary areas have appeared at the interface between biological and physical sciences. In this work, we suggest a complex network-based methodology for analyzing the interrelationships between some of these interdisciplinary areas, including Bioinformatics, Computational Biology, Biochemistry, among others. This approach has been applied over respective data derived from Wikipedi…
▽ More
Several interdisciplinary areas have appeared at the interface between biological and physical sciences. In this work, we suggest a complex network-based methodology for analyzing the interrelationships between some of these interdisciplinary areas, including Bioinformatics, Computational Biology, Biochemistry, among others. This approach has been applied over respective data derived from Wikipedia. Related reviews from the scientific literature are also considered as a reference, yielding a respective bipartite hypergraph which can be used to gain insights about the interrelationships underlying the considered interdisciplinary areas. Several interesting results are obtained, including greater interconnection between the considered interdisciplinary areas with biological than with physical sciences. A good agreement was also found between the network obtained from Wikipedia and the interrelationships revealed by the literature reviews. At the same time, the former network was found to exhibit more intricate relationships than in the hypergraph derived from the literature review.
△ Less
Submitted 8 May, 2019;
originally announced May 2019.
-
Opinion Diversity and Social Bubbles in Adaptive Sznajd Networks
Authors:
Alexandre Benatti,
Henrique Ferraz de Arruda,
Filipi Nascimento Silva,
Cesar Henrique Comin,
Luciano da Fontoura Costa
Abstract:
Among the several approaches that have been attempted at studying opinion dynamics, the Sznajd model provides some particularly interesting features, such as its simplicity and ability to represent some of the mechanisms believed to be involved in opinion dynamics. The standard Sznajd model at zero temperature is characterized by converging to one stable state, implying null diversity of opinions.…
▽ More
Among the several approaches that have been attempted at studying opinion dynamics, the Sznajd model provides some particularly interesting features, such as its simplicity and ability to represent some of the mechanisms believed to be involved in opinion dynamics. The standard Sznajd model at zero temperature is characterized by converging to one stable state, implying null diversity of opinions. In the present work, we develop an approach -- namely the adaptive Sznajd model -- in which changes of opinion by an individual (i.e. a network node) implies in possible alterations in the network topology. This is accomplished by allowing agents to change their connections preferentially to other neighbors with the same state. The diversity of opinions along time is quantified in terms of the exponential of the entropy of the opinions density. Several interesting results are reported, including the possible formation of echo chambers or social bubbles. Additionally, depending on the parameters configuration, the dynamics may converge to different equilibrium states for the same parameter setting, which suggests that this phenomenon can be a phase transition. The average degree of the network strongly influences the resultant opinion distribution, which means that echo chambers are easily formed in lower connected systems.
△ Less
Submitted 2 August, 2019; v1 submitted 2 May, 2019;
originally announced May 2019.
-
Characterization and space embedding of directed graphs and social networks through magnetic Laplacians
Authors:
Bruno Messias,
Luciano da F. Costa
Abstract:
Though commonly found in the real world, directed networks have received relatively less attention from the literature in which concerns their topological and dynamical characteristics. In this work, we develop a magnetic Laplacian-based framework that can be used for studying directed complex networks. More specifically, we introduce a specific heat measurement that can help to characterize the n…
▽ More
Though commonly found in the real world, directed networks have received relatively less attention from the literature in which concerns their topological and dynamical characteristics. In this work, we develop a magnetic Laplacian-based framework that can be used for studying directed complex networks. More specifically, we introduce a specific heat measurement that can help to characterize the network topology. It is shown that, by using this approach, it is possible to identify the types of several networks, as well as to infer parameters underlying specific network configurations. Then, we consider the dynamics associated with the magnetic Laplacian as a means of embedding networks into a metric space, allowing the identification of mesoscopic structures in artificial networks or unravel the polarization on political blogosphere. By defining a coarse-graining procedure in this metric space, we show how to connect the specific heat measurement and the positions of nodes in this space.
△ Less
Submitted 9 December, 2018; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Malleability of complex networks
Authors:
Filipi N. Silva,
Cesar H. Comin,
Luciano da F. Costa
Abstract:
Most complex networks are not static, but evolve along time. Given a specific configuration of one such changing network, it becomes a particularly interesting issue to quantify the diversity of possible unfoldings of its topology. In this work, we suggest the concept of malleability of a network, which is defined as the exponential of the entropy of the probabilities of each possible unfolding wi…
▽ More
Most complex networks are not static, but evolve along time. Given a specific configuration of one such changing network, it becomes a particularly interesting issue to quantify the diversity of possible unfoldings of its topology. In this work, we suggest the concept of malleability of a network, which is defined as the exponential of the entropy of the probabilities of each possible unfolding with respect to a given configuration. We calculate the malleability with respect to specific measurements of the involved topologies. More specifically, we identify the possible topologies derivable from a given configuration and calculate some topological measurement of them (e.g. clustering coefficient, shortest path length, assortativity, etc.), leading to respective probabilities being associated to each possible measurement value. Though this approach implies some level of degeneracy in the mapping from topology to measurement space, it still paves the way to inferring the malleability of specific network types with respect to given topological measurements. We report that the malleability, in general, depends on each specific measurement, with the average shortest path length and degree assortativity typically leading to large malleability values. The maximum malleability was observed for the Wikipedia network and the minimum for the Watts-Strogatz model.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Pattern Recognition Approach to Violin Shapes of MIMO database
Authors:
Thomas Peron,
Francisco A. Rodrigues,
Luciano da F. Costa
Abstract:
Since the landmarks established by the Cremonese school in the 16th century, the history of violin design has been marked by experimentation. While great effort has been invested since the early 19th century by the scientific community on researching violin acoustics, substantially less attention has been given to the statistical characterization of how the violin shape evolved over time. In this…
▽ More
Since the landmarks established by the Cremonese school in the 16th century, the history of violin design has been marked by experimentation. While great effort has been invested since the early 19th century by the scientific community on researching violin acoustics, substantially less attention has been given to the statistical characterization of how the violin shape evolved over time. In this paper we study the morphology of violins retrieved from the Musical Instrument Museums Online (MIMO) database -- the largest freely accessible platform providing information about instruments held in public museums. From the violin images, we derive a set of measurements that reflect relevant geometrical features of the instruments. The application of Principal Component Analysis (PCA) uncovered similarities between violin makers and their respective copyists, as well as among luthiers belonging to the same family lineage, in the context of historical narrative. Combined with a time-windowed approach, thin plate splines visualizations revealed that the average violin outline has remained mostly stable over time, not adhering to any particular trends of design across different periods in music history.
△ Less
Submitted 8 August, 2018;
originally announced August 2018.
-
Paragraph-based complex networks: application to document classification and authenticity verification
Authors:
Henrique F. de Arruda,
Vanessa Q. Marinho,
Luciano da F. Costa,
Diego R. Amancio
Abstract:
With the increasing number of texts made available on the Internet, many applications have relied on text mining tools to tackle a diversity of problems. A relevant model to represent texts is the so-called word adjacency (co-occurrence) representation, which is known to capture mainly syntactical features of texts.In this study, we introduce a novel network representation that considers the seman…
▽ More
With the increasing number of texts made available on the Internet, many applications have relied on text mining tools to tackle a diversity of problems. A relevant model to represent texts is the so-called word adjacency (co-occurrence) representation, which is known to capture mainly syntactical features of texts.In this study, we introduce a novel network representation that considers the semantic similarity between paragraphs. Two main properties of paragraph networks are considered: (i) their ability to incorporate characteristics that can discriminate real from artificial, shuffled manuscripts and (ii) their ability to capture syntactical and semantic textual features. Our results revealed that real texts are organized into communities, which turned out to be an important feature for discriminating them from artificial texts. Interestingly, we have also found that, differently from traditional co-occurrence networks, the adopted representation is able to capture semantic features. Additionally, the proposed framework was employed to analyze the Voynich manuscript, which was found to be compatible with texts written in natural languages. Taken together, our findings suggest that the proposed methodology can be combined with traditional network models to improve text classification tasks.
△ Less
Submitted 21 June, 2018;
originally announced June 2018.
-
Principal Component Analysis: A Natural Approach to Data Exploration
Authors:
Felipe L. Gewers,
Gustavo R. Ferreira,
Henrique F. de Arruda,
Filipi N. Silva,
Cesar H. Comin,
Diego R. Amancio,
Luciano da F. Costa
Abstract:
Principal component analysis (PCA) is often used for analyzing data in the most diverse areas. In this work, we report an integrated approach to several theoretical and practical aspects of PCA. We start by providing, in an intuitive and accessible manner, the basic principles underlying PCA and its applications. Next, we present a systematic, though no exclusive, survey of some representative wor…
▽ More
Principal component analysis (PCA) is often used for analyzing data in the most diverse areas. In this work, we report an integrated approach to several theoretical and practical aspects of PCA. We start by providing, in an intuitive and accessible manner, the basic principles underlying PCA and its applications. Next, we present a systematic, though no exclusive, survey of some representative works illustrating the potential of PCA applications to a wide range of areas. An experimental investigation of the ability of PCA for variance explanation and dimensionality reduction is also developed, which confirms the efficacy of PCA and also shows that standardizing or not the original data can have important effects on the obtained results. Overall, we believe the several covered issues can assist researchers from the most diverse areas in using and interpreting PCA.
△ Less
Submitted 19 June, 2018; v1 submitted 6 April, 2018;
originally announced April 2018.
-
The Dynamics of Knowledge Acquisition via Self-Learning in Complex Networks
Authors:
Thales S. Lima,
Henrique F. de Arruda,
Filipi N. Silva,
Cesar H. Comin,
Diego R. Amancio,
Luciano da F. Costa
Abstract:
Studies regarding knowledge organization and acquisition are of great importance to understand areas related to science and technology. A common way to model the relationship between different concepts is through complex networks. In such representations, network's nodes store knowledge and edges represent their relationships. Several studies that considered this type of structure and knowledge ac…
▽ More
Studies regarding knowledge organization and acquisition are of great importance to understand areas related to science and technology. A common way to model the relationship between different concepts is through complex networks. In such representations, network's nodes store knowledge and edges represent their relationships. Several studies that considered this type of structure and knowledge acquisition dynamics employed one or more agents to discover node concepts by walking on the network. In this study, we investigate a different type of dynamics considering a single node as the "network brain". Such brain represents a range of real systems such as the information about the environment that is acquired by a person and is stored in the brain. To store the discovered information in a specific node, the agents walk on the network and return to the brain. We propose three different dynamics and test them on several network models and on a real system, which is formed by journal articles and their respective citations. Surprisingly, the results revealed that, according to the adopted walking models, the efficiency of self-knowledge acquisition has only a weak dependency on the topology, search strategy and localization of the network brain.
△ Less
Submitted 27 February, 2018; v1 submitted 26 February, 2018;
originally announced February 2018.
-
Note: Distance-Based Network Partitioning
Authors:
Paulo J. P. de Souza,
Cesar H. Comin,
Luciano da F. Costa
Abstract:
A new method for identifying soft communities in networks is proposed. Reference nodes, either selected using a priori information about the network or according to relevant node measurements, are obtained. Distance vectors between each network node and the reference nodes are then used for defining a multidimensional coordinate system representing the community structure of the network at many di…
▽ More
A new method for identifying soft communities in networks is proposed. Reference nodes, either selected using a priori information about the network or according to relevant node measurements, are obtained. Distance vectors between each network node and the reference nodes are then used for defining a multidimensional coordinate system representing the community structure of the network at many different scales. For modular networks, the distribution of nodes in this space often results in a well-separated clustered structure, with each cluster corresponding to a community. The potential of the method is illustrated with respect to a spatial network model and the Zachary's karate club network.
△ Less
Submitted 1 February, 2018;
originally announced February 2018.